Sometime in the last 18 months, the volume of code your team ships stopped being limited by how fast engineers could write it. Claude Code, Cursor, Copilot — pick your tool. The output is real, it's fast, and it mostly works. The productivity gains are genuine. This isn't a critique of AI coding tools.
This is a question about what happens next. Specifically: who is reading that code?
The confidence problem
AI-generated code has a specific failure mode that makes it harder to review than human-written code: it looks right. It's clean, it's idiomatic, it follows conventions, it has variable names that make sense. There's no telltale sign of fatigue or inexperience in the style. The tests pass.
What's missing is often structural, not syntactic. An API contract that was documented one way in 2022 and works differently now — the model was trained on the old behavior. An error path that doesn't exist because the agent assumed the happy path was the only path worth modeling. A security boundary that was never enforced because the surrounding code didn't make it obvious one was needed.
These aren't bugs that grep catches. They're decisions — or the absence of decisions — that accumulate across a codebase over months.
Why junior engineers can't review it
The honest version: junior engineers review code by checking whether it matches what they know. If the code looks like code they've seen before, it passes. If it uses a pattern they recognize, it passes. This is a reasonable heuristic for human-written code, where the person who wrote it also carried the context.
Agent-written code breaks that heuristic. It matches every surface-level pattern while potentially missing the thing that only becomes obvious three months later under production load, or during a security audit, or when a new engineer tries to extend it and finds that nothing is where it should be.
A junior engineer reviewing a Claude Code PR is not slower than a senior engineer. They're reading a different document. They see the code that's there. A senior engineer also sees the code that isn't — and the place it was omitted and why.
What actually needs reviewing
When we review AI-generated code, these are the things we look for that junior engineers consistently miss:
Architecture drift. One agent session doesn't change your architecture. Fifty of them do. Agents write code that works locally — that solves the immediate problem in isolation. Over time, this produces a codebase where the same problem is solved five different ways, none of them wrong, none of them the one you'd have chosen if you'd been paying attention.
Hallucinated contracts. Models are trained on documentation that may be months or years out of date. We've seen agent-written code that calls library functions with deprecated argument signatures, relies on HTTP behavior that changed in a client upgrade, and assumes database semantics that were true in an older version of the ORM. The code is syntactically correct and will fail at runtime in a way that takes time to diagnose.
Missing error handling at decision points. Agents write for the path that was described. If the prompt didn't say "and handle the case where the payment provider returns a 429," that case doesn't get handled. Distributed systems fail in ways that are hard to anticipate unless you've seen them fail. That's experience, and models don't have it the way a senior engineer does.
Security surface opened by abstraction. Clean abstractions can hide security requirements. An agent that builds a nice data access layer might not thread authorization checks through the new query paths because the existing ones didn't make the pattern obvious. The code is elegant. The auth gap is real.
The scalability question
There's an argument that goes: you can't hire senior engineers fast enough to keep pace with agent-written code, so you have to accept some review debt. This is true. You can't review every PR with the depth it deserves.
The right response isn't to stop reviewing. It's to review differently. Risk-ranked, pattern-focused, with clear guardrails established upfront that let agents operate within known constraints. Architecture decisions made deliberately, before agents are given the space to make them by default.
The teams that are handling this well have a senior engineer whose primary job is knowing what the agents are producing — not fixing individual bugs, but maintaining the coherence of the system across hundreds of agent sessions. That's a different job than it was two years ago. It requires different skills, a different cadence, a different kind of attention.
What to do about it
If your team is shipping significant amounts of agent-written code and doesn't have a clear answer to "who is responsible for the architecture that emerges from it," that's a gap worth closing before it becomes expensive.
The options aren't binary. You don't need to hire a new senior engineer or slow down the agents. You need a qualified opinion on what's accumulating — and a set of guardrails that let agents keep moving without making decisions they shouldn't.
This is what we do. See how it works →
If you want a read on the existing codebase before setting up ongoing review, start with Eloquent Lens — a full cold read of your repository delivered in 2 business days.