Home Benchmarks Learn Tools News
Learn · Guides · Cost

Stop Burning Tokens.

Real cost math for an 8-hour AI coding session at every model tier — plus when to cache, compact, or start fresh.

SPONSOR

AppSignal — Stop vibe-debugging. Every exception, every backtrace, grouped so you see patterns, not noise.

↗
On this page
  1. Why your bill is asymmetric
  2. Model pricing matrix
  3. Caching: the 10× lever
  4. Compact vs start fresh
  5. When sub-agents pay off
  6. Live: session cost calculator
  7. Common pitfalls
TL;DR
  • Output costs 4–5× input. Ask for diffs, not full file rewrites.
  • Cached reads cost 10%. Structure prompts so the long parts (instructions, codebase context) are cacheable.
  • Sonnet for the boring 80%, Opus for the hard 20%. Picking the right tier per task is the biggest single win.
  • Restart > compact when the next task is unrelated. The "long session" is a habit, not a benefit.
CH 01

Why your bill is asymmetric.

Every model has three prices, not one:

  • Input — text you send (your prompt, the conversation history, file contents).
  • Output — text the model generates (code, explanations, tool calls).
  • Cached input — input the provider has already seen and stored.

Output is roughly 4–5× more expensive than fresh input. Cached input is roughly 10% the price of fresh input. So a one-shot "rewrite this file" can cost more than a full hour of careful patches — because rewriting forces the model to generate every line as output, while patching has it generate a tiny diff.

Mental model: token cost ≈ diff size × output-multiplier  +  context size × cache-discounted-input-rate. Optimize for small diffs and cache-warm context, not "the smallest possible model."

CH 02

Model pricing matrix (May 2026).

Prices per million tokens. Verify against the provider's page before you optimize anything serious — vendors adjust these often. The calculator below uses these defaults.

Model Input / 1M Cached / 1M Output / 1M Sweet spot
Claude Opus 4.7 $15 $1.50 $75 Architecture, hard bugs, refactors
Claude Sonnet 4.6 $3 $0.30 $15 Default agent loop, ~80% of work
GPT-5.3 Codex $10 $1 $30 Codex CLI batches, long autonomous runs
GPT-5.5 $15 $1.50 $60 Tricky reasoning, planning, hard math
Gemini 3.1 Pro $5 $0.50 $15 Long-context (2M tok), search-aware
Composer 2.5 Fast Included Included Included Quick tweaks in Cursor subscriptions

The Sonnet rule: if you can't articulate why Opus is needed for the next task, you don't need Opus. Sonnet does ~80% of real coding work for ~20% of the cost. Reserve Opus for: ambiguous architecture decisions, refactors across >10 files, debugging that's taken you longer than an hour.

CH 03

Caching: the 10× lever.

Anthropic, OpenAI, and Google all cache prefix-matching input. If turn 1 sends "system prompt + project rules + file A + question 1" and turn 2 sends "system prompt + project rules + file A + question 2", the prefix is reused. You pay 10% of the input rate for the cached chunk.

This sounds like free magic. It is, but it requires shaping your prompts so the long, static parts come first and the short, varying parts come last. Reverse the order and you cache nothing.

good prompt shape (cache-friendly)
┌──────────────────────────────────┐
│ STATIC PREFIX (cached after 1st) │
├──────────────────────────────────┤
   System prompt
   Project rules / AGENTS.md
   File contents (the relevant 3-5 files)
   Tool definitions
├──────────────────────────────────┤
│ VARIABLE SUFFIX (paid in full)   │
└──────────────────────────────────┘
   This turn's user message
   This turn's tool call results

In Claude Code, Cursor, and Codex CLI this happens automatically as long as you don't keep re-loading different files mid-conversation. The way to ruin caching: every turn, the agent reads a new file with cat and dumps it inline. Now the prefix changes every turn and your cache hit rate is zero.

Tactic: at the start of a session, ask the agent to read the files it needs once, summarize what it learned, and proceed. Re-load only on demand.

Pitfall · Cache TTL is 5 minutes by default

If you take a coffee break longer than the cache TTL (5 min on Anthropic, configurable up to 1 hour on the new tier), the prefix evaporates and turn N+1 pays full price for the warm-up. Either keep moving or explicitly opt into the 1-hour cache via the API headers your tool exposes.

CH 04

Compact vs start fresh.

"/compact" is the most overused command in Claude Code. People run it because the UI shows context filling up — but the question is never "how full is the context", it's "does any of this old context still help with the next thing?"

Situation Compact? Start fresh?
Same feature, you're 40 turns in, want to keep going Yes No
Switching from "build auth" to "fix CSS bug" No Yes
Agent is repeating wrong answers across turns No Yes
Context is 80% but the relevant 20% is still useful Yes No
You can't summarize what the session is about in one sentence No Yes

The rule of thumb: compacting always loses signal. If you're already losing signal because the agent is confused, compact will help. If you're not losing signal, compact will hurt and you'll re-load the same files anyway.

CH 05

When sub-agents pay off.

Sub-agents look like a free productivity boost — until you do the math. Spawning a sub-agent means duplicating the system prompt and tool definitions, then carrying the return message in your main context. Net cost is real.

The math works out when:

  • Research collapses into a summary. Sub-agent reads 50k tokens of docs, returns 500 tokens of answer. You spend 50k in the sub-agent's context, 500 in yours. Win.
  • Tasks parallelize cleanly. Four sub-agents on four independent files in 5 minutes vs four sequential turns in 20 minutes. Wall-clock win, often a cost win because each sub-agent's context is smaller than your shared one would be.
  • You want to throw away the working notes. The sub-agent's whole context dies when it returns. You keep only the answer.

The math fails when:

  • The work is sequential. Three steps where each depends on the last. A sub-agent buys you nothing.
  • The sub-agent needs your context to do its job. If you have to brief it for 5 minutes, you've already lost.
DEMO · INTERACTIVE

Live: session cost calculator.

Pick a model, drag the sliders to match what your day actually looks like. Numbers update live. All math runs in your browser — nothing leaves the page.

Session cost calculator Prices in USD · Verify against vendor pages before relying on numbers
Per session $0.00
Per day $0.00
Per month (22 days) $0.00

Things to try: bump cache from 70% to 0% (see why it matters). Switch from Sonnet to Opus on the same workload (4–5× jump). Drop output from 1.5k to 0.5k by asking for diffs not rewrites.

PITFALLS

Common pitfalls.

Picking Opus for everything "to be safe"

This is the single biggest waste of money in AI coding. Opus is 5× the price of Sonnet for tasks where the quality delta is <10%. Default to Sonnet, escalate explicitly.

Letting the agent re-read the codebase every prompt

You'll see cat src/**/*.ts or 50 file reads in the agent's transcript. That kills your cache. Get it to summarize the relevant files once at the start of the session, then reference the summary.

"Rewrite the file" instead of "edit lines 40-65"

Modern tools default to patch-style edits via apply-diff tools. If yours is rewriting whole files, check whether the diff tool is enabled. Output tokens drop 5–20× the moment you switch.

Forgetting that think mode bills like Opus

Extended thinking / reasoning tokens count as output. A 30k-token thinking block on Opus is $2.25 by itself. Use thinking for hard problems, not for "summarize this email."

What to read next.

  • Guide · 03 Cursor for Web Developers Picking the right model per task is the biggest single cost lever. Cursor's model picker explained.
  • Tool OpenCode BYO key, swap providers freely — useful when you want full control over which model runs which loop.
  • Skill Modern CSS SKILL.md Smaller diffs = lower output cost. A skill that teaches your agent modern CSS prevents wasteful rewrites.
Changelog
  • 2026-05-18Initial publish. Pricing matrix verified against vendor pages May 2026.
STATUS ● BUILDING THE FUTURE
MISSION LLM RESOURCES
VERSION BETA 3.0

BUILD WITH AI. SHIP WITH CONFIDENCE.

@WEBDEVELOPERHQ ↗
TERMS / PRIVACY
FRIENDS
Authentic Jobs ↗
Web Reference ↗
Ready.dev ↗
Fullres ↗
© 2026 WEB DEVELOPER / ALL RIGHTS RESERVED