Tokenmaxxing: Amazon Built an AI Leaderboard and Employees Strapped It to the Dog

NativeFirst Team 9 min read
Developer screens through glasses showing code and dashboards — representing the blurry metrics behind tokenmaxxing at Big Tech companies

You know how some companies gave everyone Fitbits and started a step-count leaderboard? The idea was beautiful. Healthy employees, friendly competition, insurance discounts. What could go wrong?

Within three weeks, people were strapping the watches to their golden retrievers. Taping them to ceiling fans. One guy on Reddit admitted he put his on a paint shaker at Home Depot during lunch breaks. The leaderboard said the engineering department averaged 28,000 steps a day. Nobody lost a single pound.

That’s basically what just happened at Amazon. Except instead of steps, it was AI tokens. And instead of nobody losing weight, nobody shipped better code.


Amazon’s KiroRank: The Leaderboard That Ate Itself

On May 29, 2026, Amazon quietly pulled the plug on an internal dashboard called KiroRank — a leaderboard that tracked how many AI tokens employees consumed through Kiro, Amazon’s AI-forward developer platform.

The original intent was reasonable. Amazon’s leadership wanted to drive AI adoption across engineering teams. They’d been pushing for more than 80% of developers to use AI tools weekly. KiroRank was supposed to show who was embracing the future and who was dragging their feet.

What actually happened was predictably absurd.

Employees started tokenmaxxing — running pointless tasks through AI agents purely to inflate their usage scores and climb the leaderboard. Not fixing bugs. Not shipping features. Just feeding the machine meaningless busywork so their little number went up.

Think about that for a second. Amazon — the company that invented the two-pizza team and obsesses over customer metrics — accidentally incentivized its engineers to waste compute resources on nothing. It’s like putting a “lines of code written per day” counter on every developer’s desk. In 1998, we all agreed that was an absurd metric. In 2026, we renamed it and did it again.

Dave Treadwell, Amazon’s SVP of Engineering, sent a message to staff that should be printed on a poster in every tech company’s cafeteria: “Please don’t use AI just for the sake of using AI. Use AI to help you solve customer problems, to help you solve business problems, to innovate.”

Amazon replaced KiroRank with a “normalized deployments” metric — measuring AI-assisted code that actually ships. Which is what they should have been tracking from the start.


Uber: How to Burn a Year’s Budget in Four Months

Amazon wasn’t alone in this expensive lesson.

Uber exhausted its entire 2026 AI token budget in the first four months of the year. COO Andrew Macdonald said on a podcast that the spending “hadn’t led to a measurable increase in projects or productivity.”

Let me say that again: they spent the whole year’s AI budget by April. And got nothing measurable for it.

This isn’t a startup burning through seed money. This is Uber. A company with actual financial controls and a CFO who presumably asks questions. And they still got caught in the tokenmaxxing trap — treating consumption as a proxy for progress.

Microsoft went in the opposite direction. They straight-up cancelled Claude Code subscriptions for employees in several key product divisions. When one of the richest companies on Earth decides the AI bill isn’t worth paying, that’s not a budget trim. That’s a verdict.

Meta took down its informal tokenmaxxing leaderboard too. The pattern was identical: employees gaming usage metrics, costs spiraling, and leadership eventually asking the uncomfortable question — “Wait, what did we actually get for all this?”


The $1 Dollar Breakdown That Should Make CTOs Cry

A study of 2,444 companies in 2026 broke down exactly where every dollar of AI token spending goes. The numbers are brutal:

  • $0.44 — fixing bugs that the AI generated
  • $0.27 — rewriting AI-produced code that wasn’t up to standard
  • $0.11 — review delays and merge conflicts from AI output
  • $0.18 — actual useful, net-new productive work

Read that again. For every dollar spent on AI tokens, 82 cents goes toward cleaning up the AI’s own mess.

This is the corporate equivalent of hiring a painter who slops paint everywhere and then billing you for the cleanup crew. Except you thought the painter was making you money because he was using a lot of paint.


The METR Paradox: Slower, But Feeling Faster

Here’s the part that really bends your brain.

We covered the METR study back in March — experienced developers were actually 19% slower with AI tools while believing they were 20% faster. The perception gap was almost perfectly inverted from reality.

But the story got weirder. In February 2026, METR tried to repeat the experiment. They couldn’t. Developers refused to participate because they wouldn’t work without AI tools — not even temporarily, not even for $50/hour.

Think about what that means. Developers are so dependent on tools that measurably slow them down that they won’t give them up even to prove they’re faster without them. It’s like refusing to take off ankle weights because you’ve convinced yourself they make you run faster.

TechCrunch reported on this in May: developers self-reported that AI made them “twice as valuable” to their organizations. Meanwhile, CodeRabbit’s analysis of open-source pull requests found that AI produces 1.7x more problems than human-written code. And Entelligence AI’s CEO reported that companies are spending 44% of all their AI tokens on bug fixes for bugs the AI itself generated.

So we’re twice as productive at producing 1.7x more bugs and spending nearly half our budget fixing them. Peak 2026 engineering.


Jensen Huang’s Half-Salary Rule

Nvidia CEO Jensen Huang threw gasoline on this fire when he told engineers they should be consuming AI tokens worth at least half their annual salary each year to be “fully productive.”

Let that sink in. If you make $200K, Huang thinks you should be burning $100K in AI tokens annually. Not based on outcomes. Not based on shipped features or fixed bugs. Based on consumption.

This is the Goodhart’s Law hall of fame. “When a measure becomes a target, it ceases to be a good measure.” Except in this case, the measure was never good. Token consumption is not productivity. It never was. It’s like measuring a restaurant’s quality by how much gas the stove uses.

But when the CEO of the company selling the GPUs says “buy more tokens,” well… managers listened. And employees, facing implicit pressure to hit usage targets, found creative ways to tokenmax their way to compliance.


Why This Matters for Developers (Especially iOS Developers)

If you’re an indie developer or work at a smaller company, you might be watching this Big Tech circus and laughing. Fair enough. But there are real lessons here.

The tool isn’t the strategy. Whether you’re using Cursor, Claude Code, or Xcode’s AI features, the value comes from what you build, not how many tokens you burn building it. I’ve seen solo developers ship polished iOS apps in a weekend using AI intelligently — and I’ve seen funded teams burn through $50K in API costs and produce something that crashes on launch.

Metrics that don’t connect to outcomes are dangerous. If your freelance workflow measures “hours spent coding” rather than “features delivered” or “bugs fixed,” you’re running your own personal KiroRank. Track what matters. For iOS developers, that’s App Store ratings, crash-free sessions, and user retention — not how many Copilot suggestions you accepted.

The best AI users are skeptical AI users. The developers who get genuine value from AI tools are the ones who treat them as unreliable junior developers — useful for boilerplate, dangerous for architecture, and always in need of code review. We talk about this a lot in our SwiftUI courses — understanding why code works is the skill that AI can’t replace, and it’s the skill that saves you when AI-generated code breaks at 2 AM in production.

If you’ve been feeling the brain fry from running too many agents in parallel, or if you’ve noticed your coding skills getting rusty because you’re prompting more than programming, tokenmaxxing is the macro-scale version of your micro-scale problem. The whole industry is consuming more and producing less.


The Real Metric Nobody Wants to Track

Here’s the uncomfortable truth: over 80% of companies using AI showed no measurable productivity benefit in 2026. Not a decline. Not a gain. A flat line.

AI coding tools have been mainstream for over two years now. Adoption is at 92%. And the aggregate impact on actual business output is… nothing. The pricing trap is real, the cognitive overhead is brutal, and the industry just spent a quarter figuring out that tracking token consumption is about as useful as tracking keyboard clicks per minute.

Amazon learned the lesson the expensive way. Uber learned it the really expensive way. The rest of us can learn it by watching.

The best code in 2026 isn’t written by the developer who burns the most tokens. It’s written by the developer who knows when to close the AI chat and think for themselves.

Dave Treadwell was right. Don’t use AI just to use AI. Use it to solve problems.

And maybe don’t strap the Fitbit to the dog.


Share this post

Share on X LinkedIn

Comments

Leave a comment

0/1000

N

NativeFirst Team

Editorial

The NativeFirst team — engineers and designers building native Apple apps and writing the courses we wish we had when we started.