Strategy5 min read10 Jun 2026

Token Economics: Why AI Pricing Is Quietly Breaking Your SaaS Margins

Per-seat pricing assumed labor was the cost. When the model does the work, your COGS becomes a variable you don't control. Here's the math.

Written by northstar editorial·Updated 10 Jun 2026

The cleanest business model in software history was per-seat SaaS. You charged a fixed amount per user per month, and serving one more user cost you fractions of a cent. Gross margins floated at 85%. Investors could forecast revenue with a spreadsheet and a straight ruler.

AI broke that ruler. And most teams haven't updated their pricing to match.

The hidden variable in your COGS

In classic SaaS, cost of goods sold was a rounding error. Servers, bandwidth, a bit of storage. The marginal cost of a user approached zero, which is exactly why the model printed money.

AI products carry a different cost structure. Every meaningful action — a generated draft, a resolved ticket, an agent run — triggers inference, and inference costs real money that scales linearly with usage. Your COGS is now a variable you only partially control, set in large part by a foundation lab's pricing page.

This is the part founders underweight. Your gross margin is no longer a property of your architecture. It's a property of someone else's API.

Per-seat pricing is lying to you

Here's the failure mode I see most. A team ships an AI feature, prices it per seat because that's what they've always done, and watches a small cohort of power users quietly destroy the unit economics.

The distribution of AI usage is brutally skewed. The top 5% of users in a typical AI product consume 40-60% of the tokens. Under per-seat pricing, that power user pays the same $30 as the colleague who logs in twice a month. You're subsidizing your heaviest users with your lightest ones — and your heaviest users are precisely the ones getting the most value and least likely to churn.

You've built a pricing model that loses more money the more your best customers love you.

The three pricing primitives that actually work in 2026

The teams getting this right combine three things:

A floor. A seat or platform fee that covers your baseline and gives the buyer a predictable line item. Procurement needs a number they can put in a budget.

A metered layer with a generous included allowance. Above the floor, you meter the expensive actions. The allowance has to be generous enough that the median user never hits it — overage should feel like a signal of heavy value, not a surprise penalty.

An outcome anchor. Wherever possible, name the unit in terms of value delivered, not tokens consumed. "Per resolved conversation," "per published asset," "per completed workflow." This is the single most important move, and I'll explain why.

Newsletter

Like this decoded read? Get the next one in your inbox.

One sharp take on the latest in AI for PMs, founders and operators. Every few days. Free.

Free forever. Unsubscribe anytime. No spam.

Decouple your price from your cost

If your price is tokens-plus-margin, you've handed control of your business to the labs. When a model gets 4x cheaper — and it will, on a roughly annual cadence — customers expect that savings. When a model gets more expensive or you're forced onto a pricier tier for quality, you eat it.

Outcome pricing severs that link. A customer paying $2 per resolved support ticket does not know or care whether that ticket cost you 12 cents or 3 cents in inference. When the model gets cheaper, your margin expands and you keep it. When you optimize your harness — caching, routing, smaller models for easy cases — that gain is yours too.

The companies with the healthiest AI margins in 2026 are not the ones with the cheapest models. They're the ones whose customers have no idea what a token costs.

The margin optimization stack

Once you've fixed pricing, the second lever is COGS itself. There's a real engineering discipline emerging here, and it's worth a dedicated owner:

Model routing. Most requests don't need your most expensive model. Route the easy 70% to a cheaper or on-device model and reserve the frontier model for genuinely hard cases. Done well, this halves COGS with no quality hit users can perceive.

Caching. Prompt caching and semantic caching of repeated queries are free money. If 30% of requests are near-duplicates, you're paying for them more than once.

Allowance shaping. Use rate limits and tiered features to keep abuse and runaway agents from torching your margins. An agent in a loop can burn a month of a customer's allowance in an afternoon.

This is now a profit center function, not a back-office concern. I've seen teams move gross margin from 45% to 68% in a quarter purely on routing and caching, without touching price.

What this means for forecasting

Stop pretending usage-based AI revenue forecasts like seat-based MRR. It doesn't, and forcing the old model onto it leads to bad decisions.

Model it like a consumption business instead. Track usage cohorts and how they expand inside an account over time. Watch the ratio of metered overage to base fee — a healthy AI product sees that ratio climb as customers deepen usage. Yes, it's lumpier than seats. But the expansion ceiling is dramatically higher, because you're no longer capped by how many humans the customer employs.

The takeaway

Per-seat pricing was a gift from an era where labor was the cost and software was free to serve. That era is over for AI products. Your costs are now variable, skewed, and partly out of your hands.

The response isn't to copy whatever the labs charge per token and slap on a markup. It's to charge for outcomes, decouple price from cost, and build real engineering muscle around bringing COGS down. Do that, and the broken ruler stops being a threat and starts being a moat — because your competitors are still pricing like it's 2019.

Frequently asked questions

Should AI products charge per seat or per usage? Neither, on its own. Per-seat collapses when one power user burns 50x the tokens of an average one. Pure usage scares buyers who can't predict their bill. The pattern that's working in 2026 is a seat or platform floor plus metered usage above a generous included allowance — predictable for the buyer, margin-safe for you.

Why are AI SaaS gross margins lower than traditional SaaS? Because inference is real, variable COGS that scales with usage. Traditional SaaS hit 80-90% gross margins because serving one more user cost almost nothing. AI products carry a per-request cost, so margins of 50-70% are common until you optimize routing, caching, and model selection.

Can I just pass model costs through to customers? You can, but it exposes you. If your price is a thin markup on tokens, every model price cut from the labs becomes a customer expectation for a discount, and every price hike becomes your problem. Decouple your price from your cost by charging for outcomes, not tokens.

How do I forecast revenue with usage-based AI pricing? You don't forecast it like seat-based MRR — you model it like a consumption business. Track usage cohorts, expansion within accounts, and the ratio of metered overage to base. Lumpier than seats, but the expansion ceiling is far higher.

Want more sharp takes on AI tooling? Browse the AI Decoded archive.

Frequently asked

Should AI products charge per seat or per usage?

Neither, on its own. Per-seat collapses when one power user burns 50x the tokens of an average one. Pure usage scares buyers who can't predict their bill. The pattern that's working in 2026 is a seat or platform floor plus metered usage above a generous included allowance — predictable for the buyer, margin-safe for you.

Why are AI SaaS gross margins lower than traditional SaaS?

Because inference is real, variable COGS that scales with usage. Traditional SaaS hit 80-90% gross margins because serving one more user cost almost nothing. AI products carry a per-request cost, so margins of 50-70% are common until you optimize routing, caching, and model selection.

Can I just pass model costs through to customers?

You can, but it exposes you. If your price is a thin markup on tokens, every model price cut from the labs becomes a customer expectation for a discount, and every price hike becomes your problem. Decouple your price from your cost by charging for outcomes, not tokens.

How do I forecast revenue with usage-based AI pricing?

You don't forecast it like seat-based MRR — you model it like a consumption business. Track usage cohorts, expansion within accounts, and the ratio of metered overage to base. Lumpier than seats, but the expansion ceiling is far higher.