Death by a billion tokens

Did you recently receive a plushie or a plaque in the mail from your favorite model provider? Maybe a branded hoodie for hitting some token consumption milestone. Or perhaps your team has a leaderboard up on a Slack channel where everyone competes to see who can burn through the most AI credits in a sprint.

In ancient China, there was a method of execution called death by a thousand cuts — each individual incision barely noticeable, none fatal on its own, but cumulatively they drained the life out of you. That is what the model providers are doing to your budget right now. One token at a time. A query here, a completion there, a background agent churning through context windows while nobody is watching. No single call breaks the bank, but a billion small ones will, and the silence between the cuts is by design. You are being executed in slow motion and the executioner is sending you a teddy bear.

The Flex That Aged Poorly

There was a brief window, maybe six months, where tokenmaxxing — sharing a screenshot of your token usage in Slack — felt like a flex. “Look how much I’m shipping with AI.” The bigger the number, the more productive you appeared. Teams gamified consumption. Vendors encouraged it — because every token you burn is revenue for them.

That window has closed.

What was once a status symbol is now a red flag — the same way a prestigious employer badge or accelerator logo once stood in for actual substance until the numbers caught up. The credential has never been the business, and the token counter is no different. Finance departments have caught on. Budgets are being scrutinized. And the companies that were celebrating token volume are now the ones getting emergency calls from their CFOs.

Uber’s $1,200 Demo

Uber’s CTO ran a two-hour demo using AI tools. The token bill: $1,200. For two hours. That’s not a demo — that’s a consulting engagement.

And that’s just the visible part. Uber blew through its entire 2026 AI budget by April. Four months. The full year’s allocation for AI tooling, consumed in a third of the fiscal year. When you hear numbers like that, the question isn’t “how innovative is the engineering team?” The question is “what happens in August when the budget is gone and there are still eight months left in the fiscal year?”

Uber is not alone. Microsoft quietly canceled the majority of its internal Claude Code licenses, pulling thousands of engineers off a tool they had adopted enthusiastically. The reason wasn’t technical — it was financial. Finance pulled the emergency brake before the new fiscal year could compound the problem.

The pattern is the same everywhere. We saw this coming — AI-generated throughput is quietly crushing the infrastructure beneath it — and the budget side of that equation is arriving faster than most teams expected. AI tooling was purchased as a productivity investment. It’s functioning as an unpredictable utility bill with no circuit breaker.

Amazon’s Leaderboard Problem

Amazon recently had to take down its internal AI leaderboard. Why? Because engineers were gaming the system. They figured out that tokenmaxxing — running prompts through the most expensive models and padding their token counts — was the path to the top of the board. The incentive structure rewarded consumption, so that’s what they optimized for.

This is what happens when you measure the wrong thing — the same category error as mistaking code generation speed for engineering effectiveness. What makes an engineer genuinely valuable has never been raw throughput, and token volume is just the latest proxy that misses the point. If your metric is “how many tokens did you use this week,” you will get engineers who use as many tokens as possible. You will not get engineers who solve problems efficiently. You will get engineers who have discovered that the path to recognition is burning through your budget.

The leaderboard got taken down because it was doing exactly what leaderboards do: it incentivized the behavior being measured, regardless of whether that behavior was valuable.

The Real Challenge

Here is what nobody is leaderboarding, because it is harder to measure and harder to gamify: the inverse of tokenmaxxing. Doing the same amount of work using the fewest tokens possible.

The engineer who can accomplish a task with 10,000 tokens of a small, local model instead of 100,000 tokens of a frontier model is not showing off. They are not posting screenshots in Slack. They are not getting plushies. But they are the engineer who is actually delivering value.

The real skill in this era is knowing which tool to use for which job — the same judgment that separates a platform engineer from someone who just knows how to press buttons. It is no coincidence that the most technically ambitious CTOs are stepping back into platform IC roles to work on exactly this kind of problem. Frontier models for complex coding and reasoning tasks. Free or cheap models for summarization, classification, and the million other simple queries that make up the bulk of AI usage. The engineer who can route work to the appropriate model — and measure the dollar cost of their decisions — is the one who will be employed at the end of this cycle.

We would like to see a leaderboard for that. A leaderboard that rewards efficiency instead of consumption. A leaderboard that tracks value delivered per dollar spent, not raw token volume. A leaderboard where the winner is the person who did the most with the least. That is a mark of a true engineer in this era. The person who burns the most tokens is not an engineer — they are a spend channel.

The BitTorrent Analogy

Imagine, in the early 2000s, a company encouraging its employees to download as much as possible over the corporate internet connection. They set up a leaderboard for who could consume the most bandwidth. The winner gets a prize. The IT department gets a throttled network and an angry phone call from the CFO.

That sounds ridiculous. Nobody would do that.

Yet that is exactly what companies are doing with AI tokens right now.

The difference is that bandwidth was understood as a shared resource. AI tokens are still treated as magic beans that fell from the sky. They didn’t. Every token has a dollar cost, and someone is paying it. The same structural risk that makes SaaS dependency a security exposure you cannot fully audit applies here — you are relying on a vendor whose incentives do not align with yours. If you are not measuring cost per task, you are not managing AI — you are subsidizing your vendors’ margins.

The Macro Picture Nobody Wants to Talk About

The shovel-sellers are having an incredible year. NVIDIA, the model providers, the infrastructure vendors — they are capturing the vast majority of the value flowing into AI. McKinsey projects $5.2 trillion in AI infrastructure spending by 2030, with 60% of that flowing straight to semiconductor companies.

The gold miners? A different story.

MIT found that human workers are still cheaper than AI in 77% of automation scenarios. Not faster, not better — cheaper. The same structural math that makes owning your infrastructure dramatically cheaper than renting from cloud providers applies to AI tooling too. The companies buying at retail and celebrating volume are the ones getting arbitraged. Not faster, not better — cheaper. The Yale Budget Lab found no systemic productivity gains across the broader economy from AI adoption. Not marginal. Not localized. Systemic. Zero.

The companies buying the tooling are not seeing the returns. The companies selling the tooling are reporting record quarters. That divergence is not sustainable, and it is already correcting. The Microsoft license cancellations and the Uber budget blowouts are early signals of a correction that is going to ripple through the entire industry.

Measuring What Matters

The question that every engineering leader should be asking right now: are you measuring AI success by adoption rate, or by value delivered per token consumed?

Adoption rate is a vanity metric. It tells you how many people clicked the button. It does not tell you whether the button produced anything worthwhile.

Value per token is the real metric. It forces hard conversations about model selection, prompt design, and whether a given task needed AI at all. It surfaces the engineers who are actually skilled at using these tools, rather than the ones who have simply figured out how to burn through credits fastest.

The companies that switch to measuring efficiency instead of consumption will be the ones that still have AI budget in October. The ones still celebrating token volume will be the ones explaining to their board why they need a budget increase in Q3.

What We’re Seeing at BootstrapVC

Across our portfolio, we are seeing a clear divide forming. The teams that treat AI tooling as an expensive resource to be optimized are extending their runway and delivering measurable ROI. The teams that treat it as an infinite free lunch are running out of money and blaming the tools.

We are advising every company we work with to start tracking cost per task, not tokens consumed. To build internal benchmarks for efficiency. To reward engineers who can do more with less. And to stop celebrating consumption as if it were productivity.

The companies that figure this out now will have a structural cost advantage that compounds over time. The ones that do not will find their AI initiatives defunded by a finance department that has learned to ask the right questions.

If you are building in this space — whether you are a founder trying to keep AI costs under control or an engineer who wants to build the right habits early — we would like to talk.

Take a step back. Breathe. And then hear this: As Joey from Friends would say — you’ve been bamboozled!

The plushie was never the point. The plaque was never the point. The point was delivering value. That has not changed.

← Back to Blog