AI Made Developers 19% Slower. They Swore It Made Them 20% Faster.

NativeFirst Team 8 min read
A runner on a treadmill in an empty gym — running fast and going absolutely nowhere, the perfect visual metaphor for the AI coding productivity paradox

You know that guy at the gym who runs on the treadmill for an hour, drenched in sweat, absolutely convinced he just burned 800 calories — and then checks the display and it says 210?

That’s you with AI coding tools right now. That’s all of us.


The Study Nobody Wanted to Hear

In July 2025, METR — an AI research lab funded by Open Philanthropy — published a study that should have been a wake-up call. They paid 16 experienced open-source developers $150 per hour to complete real tasks on their own codebases. Half the time they used AI tools. Half the time they didn’t.

The result: developers were 19% slower with AI assistance.

But here’s the part that makes this a paradox and not just bad news: those same developers believed they were 20% faster. A 39-point gap between perception and reality. External experts — economists, ML researchers, people who think about productivity for a living — had predicted AI would make them 24–39% faster.

Everyone was wrong. Everyone was wrong in the same direction.

This isn’t like misjudging a restaurant wait by five minutes. This is ordering a $150 meal, eating it, feeling full, and then discovering you somehow have less food in your stomach than when you arrived.


”I Won’t Code Without AI. Not Even for $150 an Hour.”

METR tried to run a follow-up study in early 2026. They hit a wall nobody saw coming.

Between 30 and 50 percent of invited developers refused to participate — because the study required them to sometimes code without AI tools. At $150 per hour. On their own projects.

Let that sink in. Developers would rather turn down $150/hour than write code without an AI autocomplete for a few hours. That’s not a productivity preference. That’s a dependency.

And it gets worse. Because the developers who refused were probably the ones who benefit most from AI — meaning the original study’s sample was biased toward people who are already good without it. The updated study with a broader cohort (57 developers, 800+ tasks) showed a smaller slowdown of about 4%, but the confidence interval stretched from -15% to +9%. Translation: we’re still not sure AI helps, we’re just less sure it hurts.


Uber Burned Through a Year’s AI Budget in Four Months

While METR was trying to recruit developers who’d agree to briefly use their own brains, Uber was running a different kind of experiment.

By April 2026, Uber had burned through its entire annual AI coding budget. Four months. Done. Monthly API costs per engineer ranged from $500 to $2,000 for tools like Claude Code and Cursor. Ninety-five percent of their engineers were using AI monthly. Seventy percent of committed code was AI-generated.

COO Andrew Macdonald said the quiet part out loud: “That link is not there yet” — between AI spending and consumer-facing innovation. They hadn’t shipped more features. They hadn’t fixed more bugs. They’d just burned more tokens.

Uber’s response? A $1,500 monthly cap per employee on agentic tool spending. The era of unlimited AI at work lasted about as long as unlimited PTO — which is to say, until someone actually used it.


Meanwhile, at Amazon

Remember tokenmaxxing? Amazon created an internal leaderboard called “KiroRank” to track developer AI usage on their Kiro platform. The idea was probably to encourage adoption. What actually happened was that employees started spinning up pointless AI agents just to climb the rankings.

SVP Dave Treadwell had to tell staff: “Please don’t use AI just for the sake of using AI.” A sentence that should be printed on a poster and hung in every engineering org on the planet.

Amazon killed Q Developer and bet everything on Kiro. They killed the leaderboard too. Replaced it with “normalized deployments” — measuring whether AI produces useful output, not just raw token volume. The adult-in-the-room metric nobody wanted.


The Code Isn’t Even Good

You might be thinking: “Okay, maybe it’s slower, but the code quality is at least better, right?”

CodeRabbit analyzed 470 real-world open-source pull requests in December 2025. AI-generated code had 1.7 times more issues than human-written code. But the breakdown is where it gets grim:

  • Logic and correctness issues: up 75%
  • Security vulnerabilities: up 1.5–2x
  • Readability problems: up 3x
  • Performance inefficiencies: up nearly 8x

That last one. Eight times more performance problems. You’re generating code faster that runs slower. The treadmill metaphor is doing double duty here.

And the security story is even scarier. A Stanford/NYU study tested Copilot across 89 code-generation scenarios and found 40% contained security vulnerabilities — SQL injection, XSS, buffer overflows, hardcoded credentials. The meta-finding was the real gut punch: developers using AI wrote less secure code and believed it was more secure.

There’s that perception gap again. A feature, not a bug — of the human brain, not the AI.


84% Use It. 29% Trust It.

The Stack Overflow Developer Survey tells the rest of the story in three data points:

  • Developer favorability toward AI coding tools: 77% (2023) → 72% (2024) → 60% (2025)
  • Developers who currently use AI: 84%
  • Developers who trust AI output accuracy: 29%

Eighty-four percent of developers use a tool that only twenty-nine percent of them trust. We’ve all become that person who keeps going back to the restaurant they complain about. “The food is terrible and the portions are too small” — except the portions are actually enormous and full of security vulnerabilities.

At the team level, the numbers tell a similar story: 98% more pull requests but 91% longer review times. Code churn — the percentage of code that gets changed or thrown away within 30 days — rose from 3.1% to 5.7%. You’re writing more code. You’re also deleting more code. The net velocity? A shrug emoji.


So What Do You Actually Do?

This isn’t a “throw away your AI tools” post. That ship has sailed — remember, developers literally won’t work without them now, not even for $150 an hour.

But there’s a difference between using AI tools and being used by AI tools. A few things the data actually suggests:

Stop measuring tokens. Start measuring deployments. Amazon learned this the hard way. If your metric is “how much AI did we use,” you’ll optimize for usage, not output. Measure what shipped, what stayed shipped, and what didn’t come back as a bug.

Treat AI output like a junior developer’s first draft. Review it. Question it. Don’t merge it because it compiles. The 8x performance problem rate means your AI assistant doesn’t know or care about O(n²) vs O(n). You do. That’s the job.

Invest in fundamentals, not tools. Tools come and go — Windsurf just became Devin Desktop overnight, Copilot switched to usage-based billing, Q Developer got killed entirely. What doesn’t change is your understanding of architecture, performance, and security. If you’re an iOS developer, the SwiftUI at Scale curriculum is the kind of investment that compounds regardless of which AI tool is hot this quarter.

Be honest about the perception gap. The 39-point gap in the METR study isn’t a flaw in the study design. It’s a flaw in human cognition. AI makes the act of coding feel smoother — autocomplete is satisfying, generated tests are convenient, chat-based debugging feels productive. But feeling productive and being productive are different things. The treadmill display doesn’t lie.


The Uncomfortable Bottom Line

Here’s the thing nobody in the AI tool space wants to say: the current generation of AI coding tools might be making individual developers feel faster while making engineering teams measurably slower. More code, more PRs, more commits — and longer review cycles, more bugs, more security holes, and blown budgets.

That doesn’t mean AI coding tools are useless. It means we’re in the aspirin phase — where we take the pill, feel better, and assume the headache is gone without checking whether the underlying problem is still there.

The developers who will thrive aren’t the ones who use AI the most. They’re the ones who know when to use it, when to ignore it, and — critically — when their brain is lying to them about how fast they’re going.

Check the treadmill display. You might not like what you see.

Share this post

Share on X LinkedIn

Comments

Leave a comment

0/1000

N

NativeFirst Team

Editorial

The NativeFirst team — engineers and designers building native Apple apps and writing the courses we wish we had when we started.