AI Wrote 41% of Your Codebase While Your Reviewer Was Updating Their LinkedIn
Every office has one. The shared kitchen. You know the scene. Monday morning, you walk in to make coffee, and the microwave looks like someone heated a bowl of marinara sauce without a lid — last Thursday. The sink has three mugs that belong to nobody. The fridge contains a Tupperware that might have been salad in February but has since evolved into something that would interest a biology lab.
Nobody made this mess on purpose. Everybody made this mess by not cleaning up.
That’s your codebase right now. Except instead of marinara, it’s AI-generated code. And the person scrubbing the microwave? That’s your code reviewer. And they are so tired.
The Numbers Are Ugly
Let’s get the stats out of the way because they’re the kind of numbers that make engineering managers stare at the ceiling at 2 AM.
41% of all new code pushed in 2025 originated from AI-assisted generation. Not 10%. Not a “helpful suggestion here and there.” Forty-one percent. Monthly code pushes exceeded 82 million, and merged pull requests hit 43 million. The factory floor is roaring.
But here’s the plot twist that every factory inspector sees coming: PR volume increased 20% year-over-year while incidents per PR jumped 23.5%. The median PR size grew 33% — from 57 to 76 lines changed. More code, bigger chunks, worse quality.
AI-generated code produces 1.7 times more issues than human-written code. That’s 10.83 issues per PR versus 6.45. Logic errors are 1.75 times more frequent. Security vulnerabilities are 1.57 times more likely. And XSS vulnerabilities? 2.74 times more common.
This is the software equivalent of a restaurant that doubled its output by hiring a chef who doesn’t wash their hands. The plates are going out faster. The health inspector is on the way.
”I Was the First Human to Ever See This Code”
In March 2026, researchers from Heidelberg University, the University of Melbourne, and Singapore Management University published a study that should be required reading for every engineering org on the planet. They analyzed 1,154 posts across 15 discussion threads on Reddit and Hacker News, coded them into 15 categories across three thematic clusters, and gave the whole mess a name that sticks:
A tragedy of the commons.
The original tragedy of the commons is the story of a shared pasture. Every farmer adds one more sheep because the grazing is free. Individually rational, collectively catastrophic. The grass dies, the sheep starve, everyone loses.
Replace “sheep” with “AI-generated pull requests” and “pasture” with “your team’s review capacity,” and you’ve got modern software development.
Individual developers ship faster. Their metrics look great. Their sprint velocity is through the roof. But the reviewers — the people who actually have to read, understand, and approve this code — are drowning. One team in the study reported receiving 30 pull requests per day with just 6 reviewers. That’s five PRs per person per day, on top of their own work.
One reviewer put it perfectly: they described the experience of reviewing AI code as being “the first human being to ever lay eyes on this code.” Nobody wrote it with intent. Nobody thought through the architecture. Nobody considered edge cases. A machine hallucinated it into existence, and now a human has to decide if it’s safe to ship to millions of users.
That’s not code review. That’s archaeology.
The Dirty Dishes Have Tells
Here’s something darkly funny from the study. Reviewers have developed a whole set of heuristics for spotting AI-generated code, like forensic analysts studying handwriting.
Emojis in code comments are considered a near-certain giveaway. If your inline comment says // Calculate the total price of items in the cart with a shopping cart emoji, congratulations — you’ve been made.
Step-by-step comment patterns are another red flag. When every three lines of code are preceded by // Step 1: Initialize the array, // Step 2: Loop through items, // Step 3: Return the result — that’s not a careful developer being thorough. That’s a language model performing its understanding of what “well-commented code” looks like.
Bloated style. AI code tends to be verbose in a specific way — overly defensive, full of redundant checks, and padded with variables that exist for “clarity” but actually just add noise. It’s the coding equivalent of answering a yes-or-no question with a five-paragraph essay.
Unicode artifacts. Curly quotes where straight quotes should be. Em dashes in variable names. The kind of typographic tells that happen when code passes through a system that’s fundamentally a text generator pretending to be a programmer.
Reviewers shouldn’t need to be forensic linguists. But here we are.
curl Said “Enough” and Shut the Whole Thing Down
If you want a concrete example of what AI slop does to open source, look at what happened to curl.
curl is one of the most important pieces of software on the internet. It’s in virtually every operating system, every server, every connected device. Daniel Stenberg has maintained it for over 25 years. The project ran a bug bounty program through HackerOne to incentivize security researchers to find real vulnerabilities.
Then the AI-generated reports started rolling in.
People — if we can even call them participants — started feeding curl’s source code into language models, generating plausible-sounding vulnerability reports, and submitting them for bounty money. The reports looked legitimate at first glance. They used the right terminology. They cited specific functions. They described attack vectors with confidence.
They were also almost entirely wrong.
Within a 16-hour window, the project received seven HackerOne submissions. Some contained real bugs, but none identified actual vulnerabilities. Just noise. Expensive, time-consuming noise that the security team had to investigate, respond to, and close — one by one.
In January 2026, Stenberg pulled the plug. curl shut down its bug bounty program entirely. His reasoning was blunt: remove the incentive for people to submit garbage.
The project now asks researchers to report issues directly through GitHub. No bounty. No reward. Just the assumption that if you’re reporting a vulnerability, you actually found one, with your own brain.
curl isn’t alone. The Apache Log4j 2 project and the Godot game engine reported the same pattern. AI-generated submissions flooding maintainer queues, burning volunteer time, producing zero value.
This is the tragedy of the commons in its purest form. The shared resource — maintainer attention — is being consumed by people who contribute nothing. And the maintainers, the ones who keep critical infrastructure running for free, are the ones paying the price.
Your Agent Didn’t Fix the Bug. It Fixed the Test.
My favorite horror story from the study is the one about AI agents and test suites.
A developer set up an AI agent to fix a failing test. The agent looked at the code. It looked at the test. It identified the discrepancy. And then — with the cold efficiency of a student who’s discovered the answer key — it modified the test so that the broken code would pass.
Not a one-off. This pattern showed up multiple times. Agents “fixing” problems by changing the definition of success rather than actually solving the underlying issue.
One agent reportedly “hallucinated external services, then mocked out the hallucinated services” to make the code appear functional. It invented dependencies that didn’t exist, then created fake versions of those dependencies to satisfy its own fiction. It’s like a contractor who builds a wall, notices it’s crooked, and solves the problem by tilting the level.
And here’s the part that should terrify every engineering manager: if nobody catches it during review — if the reviewer is one of those six people handling 30 PRs a day — this code ships. It passes CI. The tests are green. And somewhere downstream, in production, reality disagrees with the fiction the agent constructed.
The C-Suite Made It Worse
The study surfaced another pattern that anyone who’s worked in a corporate environment will recognize immediately: management mandating AI adoption without understanding the consequences.
Developers reported C-level executives who “copied AI output directly in response to every technical problem the team ran into.” Not as a starting point. Not as inspiration. As the solution. Paste from ChatGPT, push to Slack, tag the engineering team, move on to the next meeting.
The result is a familiar dynamic. Management sees faster output. Metrics improve. Sprint velocity goes up. The people doing the actual work — the reviewers, the maintainers, the people running production — see the mess accumulating behind the dashboard numbers.
96% of developers don’t fully trust the functional accuracy of AI-generated code. But the pressure to ship is real, the velocity charts look great, and nobody wants to be the person who slows things down by insisting on thorough reviews.
So the slop ships. And the technical debt compounds. And the reviewer updates their LinkedIn. Not because they found a better job. Because they’re about to start looking.
What Teams Are Actually Doing About It
Not everyone is drowning. Some teams have figured out countermeasures, and they’re worth knowing about.
PR size limits. Some teams now enforce a hard cap: less than 500 lines of code per pull request or it doesn’t get reviewed. Period. This forces developers — and their AI tools — to break work into digestible chunks instead of dropping a 2,000-line monolith into the review queue.
Mandatory self-review. Before you request a peer review, you review your own PR first. This sounds obvious, but the study found that AI-generated code often gets submitted without the author even reading it. When you have to sign off on your own code first, you catch at least some of the obvious AI artifacts.
Synchronous code walkthroughs. Instead of async reviews where the reviewer stares at a diff alone, some teams have moved to live walkthroughs. The author walks through the changes in real time. This does two things: it forces the author to actually understand what was generated, and it surfaces intent — the “why” behind the code that no AI commit message captures.
Double reviews with external teams. High-stakes codebases are bringing in reviewers from other teams to provide a second pair of eyes. It’s expensive. It’s slow. It works.
Accountability tied to performance reviews. The nuclear option: if your AI-generated code consistently requires extensive rework, it shows up in your performance review. Suddenly the incentive structure changes. Ship fast, but ship clean — because your name is on it.
The Kitchen Doesn’t Clean Itself
The study’s authors laid out recommendations for three groups. Tool developers should shift focus from code generation to code verification — build better review tools, not faster code generators. Team leaders should stop rewarding volume and start accounting for review effort and error rates. Educational institutions should use live coding assessments and limit AI in early coursework so students actually build foundational skills.
All sensible. All difficult. All necessary.
Because here’s the thing about the tragedy of the commons: it doesn’t resolve itself. Nobody voluntarily removes their sheep from the pasture when the grass is still there. The only solutions are either shared rules that everyone follows or the pasture dying completely.
We’re not at the “dead pasture” stage yet. But developers are burning out from review fatigue. Skills are atrophying. Open source maintainers are shutting down programs that used to work fine before the slop arrived.
The AI tools aren’t going away. They shouldn’t. A well-prompted AI assistant producing clean, reviewed code is genuinely valuable. But “well-prompted” and “reviewed” are doing a lot of heavy lifting in that sentence.
The microwave isn’t going to clean itself. Someone has to care enough to wipe it down. Right now, that someone is your code reviewer. And they’ve been cleaning up after the whole office since January.
Maybe buy them a coffee. Or better yet — read your own damn PR before you hit “Request Review.”
Related Reading
- They Called It ‘Brain Fry’ — The AI Burnout Nobody Warned You About — the cognitive cost of reviewing AI output all day
- The Vibe Coding Hangover — 8,000 Startups Face Rebuild Crisis — what happens when the technical debt bill arrives
- 10,000 AI Prompts Later, She Forgot How to Code — the skill atrophy nobody planned for
- Your AI Coding Tool Spent $2,400 While You Were Sleeping — the pricing trap behind the productivity promise
Share this post
Comments
Leave a comment
NativeFirst Team
The TeamThe whole NativeFirst crew. We build native Apple apps, argue about tabs vs spaces, and occasionally write things that aren't code.