Claude Haiku vs Sonnet is a routing decision, not a beauty contest. As of July 2026 the rates are: Claude Haiku 4.5 at $1 per million input tokens and $5 per million output, and Claude Sonnet 5 — released June 30, 2026 — at an introductory $2/$10 through August 31, 2026, rising to $3/$15 after that. The decision rule we run in production is short. Haiku is the default for bounded work: extraction, classification, tool chains of five calls or fewer. Sonnet earns its higher rate on exactly three things: long-horizon planning, ambiguous instructions, and tasks where the model must notice its own mistakes. Most agent fleets end up running both, with an explicit escalation rule between them.
"Which is better" has an easy answer — Sonnet, obviously; it costs three times as much for a reason. It is also the wrong question. Send everything to Sonnet and you overpay by 50-70% on the bounded majority of your traffic. Send everything to Haiku and you quietly ship wrong answers on the long-horizon minority. The interesting engineering lives at the boundary.
Scope note: this page decides which model a task shape deserves. The router itself — code, thresholds, per-tier cost tracking — lives in our guide to model routing and cost per task. Read this one first; implement with that one.
What Changed in the Haiku vs Sonnet Question This Summer
For most of the past year the comparison was Haiku 4.5 against Sonnet 4.5, then 4.6 — both Sonnets at $3/$15, a clean 3x sticker gap. Claude Sonnet 5 shipped June 30, 2026 on the Claude API and Amazon Bedrock the same day, and moved two goalposts at once. First, price: $2/$10 through August 31, 2026, then back to $3/$15. Second, the tokenizer: Sonnet 5's new tokenizer produces roughly 30% more tokens for the same text (Anthropic's published range: 1.0-1.35x by content), while Haiku 4.5 keeps the old one. A naive dollars-per-million comparison is therefore rigged — the two models do not count a million tokens the same way. Every figure below corrects for it.
Haiku 4.5 has not moved since October 2025: coding performance similar to Claude Sonnet 4 "at one-third the cost and more than twice the speed," per Anthropic's launch claim, 73.3% on SWE-bench Verified, still the cheapest and fastest current-generation Claude. Sonnet 5's positioning is equally blunt: performance close to Opus 4.8 at Sonnet prices, and — in AWS's launch words — a more reliable backbone for multi-step tool use.
Claude Haiku 4.5 vs Sonnet 5: The Spec Sheet, Date-Stamped
Every rate verified against Anthropic's pricing docs and the Amazon Bedrock pricing page on July 3, 2026. Reading this after August 31, 2026? The promo is over; use the standard column.
| Spec | Claude Haiku 4.5 | Claude Sonnet 5 |
|---|---|---|
| Input / output per 1M tokens | $1 / $5 | $2 / $10 to Aug 31, 2026; $3 / $15 from Sep 1 |
| Batch API (50% off) | $0.50 / $2.50 | $1 / $5 intro; $1.50 / $7.50 standard |
| Cache read (0.1x input) | $0.10 per 1M | $0.20 intro; $0.30 standard |
| Context window | 200K tokens | 1M tokens at standard per-token rates |
| Max output | 64K tokens | 128K tokens |
| Reasoning mode | Extended thinking, manual budgets | Adaptive thinking, effort parameter (defaults high) |
| Latency class | Fastest in the family | Fast |
| Reliable knowledge cutoff | Feb 2025 | Jan 2026 |
| Tokenizer | Claude 4-era | New; ~30% more tokens per text |
| Bedrock model ID | anthropic.claude-haiku-4-5-20251001-v1:0 | anthropic.claude-sonnet-5 |
Both support tool use, computer use, prompt caching, and the Batch API; on Bedrock both offer global and regional endpoints, regional at a 10% premium. The full rate table with Opus and every modifier lives in our Amazon Bedrock pricing guide. Three asymmetries above decide more routing arguments than any benchmark.
The context window is a hard wall
Haiku caps at 200K tokens. Sonnet 5 takes 1M at standard per-token rates. Anything holding a large repo or a long document set routes to Sonnet by physics, not preference.
The tokenizer discount that isn't
At the $2 intro rate, Sonnet 5 consumes ~1.3x the tokens on the same text, so the effective gap over Haiku is about 2.6x, not 2x. At the $3 standard rate: about 3.9x, not 3x. The promo makes Sonnet 5 feel closer to Haiku than it is — and September makes the gap wider than the Sonnet 4.6 era ever was.
Thinking modes are not interchangeable
Haiku 4.5 exposes classic extended thinking: you set a token budget, it spends up to that. Sonnet 5 uses adaptive thinking steered by an effort parameter that defaults to high on the API. Port a Haiku prompt across and output cost can jump past what the rate card predicts — check effort first. We got burned in week one: same eval suite, 41% more output tokens, all thinking.
Route by Task Shape, Not by Benchmark
Benchmarks measure a model alone in a room. Agent cost is the model times the task shape — turns, context growth, how recoverable a mistake is. Classify your traffic into shapes and most of the Claude Haiku vs Sonnet argument evaporates. It is the discipline that makes agentic AI costable at all.
Bounded extraction: Haiku, no debate
Pull fields from an invoice, normalize an address, parse a log line into JSON. Self-contained input, schema-checkable output, cheap-to-detect failures. This shape is 40-60% of traffic in most fleets we have reviewed. Paying Sonnet rates for it is burning money.
Classification and routing: Haiku, batched if you can
Intent detection, ticket triage, tool selection, judge calls in an eval harness. The first time we moved a triage agent from Sonnet to Haiku, the bill dropped by more than half and the confusion matrix did not move — triage is classification wearing a trench coat. Overnight classification at batch rates ($0.50 per million input) is close to free at most volumes.
Short tool chains: Haiku with guardrails
Three to five well-typed tool calls with a clear stopping condition: look up the order, check the policy, issue the credit, confirm. Anthropic shipped Haiku 4.5 explicitly as the sub-agent workhorse; its computer-use scores beat the old Sonnet 4. The guardrails matter, though: typed schemas, a turn ceiling, a loop detector.
Multi-step planning: Sonnet, and do not fight it
Eight-plus turns, dependencies between steps, a plan that must survive contact with intermediate results. This is what Sonnet 5 was built for — AWS's launch post leads with complex dependency chains and multi-step tool use. Haiku starts these tasks impressively and loses the thread quietly — too quietly to catch without instrumentation, which is what makes it expensive.
Coding: split the work, not the difference
Haiku 4.5 scores 73.3% on SWE-bench Verified — Sonnet 4-class coding at a third of the price, per Anthropic's framing. Bounded edits, review comments, test generation, lint-fix loops: Haiku. Multi-file refactors, architecture calls, anything holding a plan across a long session: Sonnet 5. Routing coding by scope beats picking one model, every time we have measured it.
The Cost-Per-Task Math
Sticker rates mislead because agents re-send the growing conversation every turn: input compounds, output mostly does not. Here is the same pair of tasks on both models, in Haiku-tokenizer counts with Sonnet 5's counts inflated 1.3x, no caching. A 3-turn extraction task (system prompt plus tools ~3K tokens, two tool results folded in) comes to roughly 13,500 cumulative input tokens and 700 output; a 12-turn agent task (4K base, ~1K of new context per turn) to roughly 114,000 input and 3,000 output.
| Task | Haiku 4.5 | Sonnet 5 intro (to Aug 31, 2026) | Sonnet 5 standard (from Sep 1) |
|---|---|---|---|
| 3-turn extraction | $0.017 | $0.044 (2.6x) | $0.066 (3.9x) |
| 12-turn agent task | $0.13 | $0.34 (2.6x) | $0.50 (3.9x) |
| 12-turn, Haiku-first, escalate the 20% that fail | $0.23 per successful task | — | — |
A 3-turn extraction task
$0.017 versus $0.044. At 200,000 extractions a month that is $3,400 versus $8,800 — and $13,200 once the promo ends. Nothing about extraction quality justifies the difference. The delta is pure routing profit.
A 12-turn agent task
$0.13 versus $0.34 versus $0.50. The ratio does not improve with task length — input compounding hits both models the same way. What changes with length is the stakes: at 12 turns, Haiku's failure rate on planning-heavy work stops being a rounding error.
Failure-adjusted cost: the number that matters
Say Haiku completes a given 12-turn shape 80% of the time — roughly what we measured on a mid-complexity ops workflow; measure yours with a real agent evaluation harness. Route Haiku-first, escalate detected failures to Sonnet at standard rates: $0.13 + 0.2 x $0.50 = $0.23 per successful task, 54% below all-Sonnet. At 100,000 tasks a month, roughly $27,000 saved. The catch is the word "detected." Pure token math favors Haiku-first at almost any failure rate; what flips the verdict is whether failures are cheap to catch. If they are not, every miss ships to a user. No rate card prices that.
Two footnotes. Caching does not change the verdict — cache reads cost 0.1x input on both models, so the ratio holds; the TTL and breakpoint mechanics live in our prompt caching guide. And through August 31, batched Sonnet 5 carries the same sticker rate as realtime Haiku ($1/$5) — a two-month window worth exploiting for background long-horizon jobs. How these levers stack on AWS is in our Claude on Bedrock cost breakdown.
Where Haiku 4.5 Degrades
Not vibes — the four failure modes we have caught in transcripts.
Plan retention past turn eight
On long tasks Haiku starts strong and drifts. The signature: around turn 8-12 it re-executes a completed step or silently drops a subtask. We watched a Haiku migration agent re-run a finished schema step at turn 9 — the plan had scrolled out of its working attention. Sonnet 5's launch positioning, maintaining plans across stages with fewer correction rounds, names exactly this gap.
Ambiguity gets resolved by guessing
Give Haiku an underspecified instruction and it picks the most literal reading and commits, fast. It rarely stops to ask. Fine on bounded tasks, even desirable. Where the wrong interpretation costs ten downstream steps, it is the root cause in a plurality of our failure reviews.
Tool errors pass silently
An empty search result, a 404 wrapped in a 200, a tool returning an error string instead of raising — Haiku tends to accept the result and keep moving. Sonnet is measurably more likely to notice, retry with adjusted arguments, or flag it. If your tools have messy failure semantics, Haiku needs deterministic checks between calls.
Self-correction is shallow
When Haiku's first approach fails, its retry is usually the same approach, rephrased. Diagnosing the failure and switching strategy is precisely what the higher rate buys. The cleanest single rule we can offer: if the task requires the model to recover from its own mistakes, do not send Haiku alone.
Latency Budgets: When Speed Decides the Argument
Sometimes price is a distraction and speed is the whole decision. Haiku 4.5 is the fastest model in the lineup — Anthropic measured 4-5x Sonnet 4.5's throughput at launch — and for interactive agents that outweighs the rate difference. A support agent answering in 2 seconds instead of 6 changes user behavior; a laggy code-completion sub-agent gets turned off.
The reverse also holds. Background pipelines have latency budgets measured in hours; there the Batch API's 50% discount dominates and the only question left is the failure-adjusted math above. Our shorthand: interactive and bounded, Haiku. Background and bounded, Haiku batched. Interactive and long-horizon, Sonnet — eat the latency. Background and long-horizon, Sonnet batched.
The Escalation Rule: Route by Verifiability
The hard question is not "can Haiku do this task." It is "will I know when it didn't." Our procedure, in order:
- Classify the task shape at intake. Extraction, classification, short chain, planning, long-document. A one-line Haiku call can do this itself for a fraction of a cent.
- Check the hard constraints. Context over 200K, output over 64K, or knowledge past February 2025 without retrieval: Sonnet, no further analysis.
- Default by shape. Bounded shapes to Haiku, planning shapes to Sonnet, coding split by scope.
- Wrap every Haiku output in a verifier. Schema validation and deterministic checks first — they are free. For fuzzier correctness, a cheap LLM-as-a-judge call against a binary rubric; the judge can be Haiku too.
- Escalate on any tripped signal. Verifier failure, a repeated tool call with identical arguments (the loop signature), or a turn count 50% over budget. Re-run on Sonnet and log it — a rising escalation rate on one shape means your intake classifier is mislabeling it.
That is the policy. The implementation — router code, threshold tuning, escalation telemetry — is the whole subject of our Claude model routing guide, which picks up exactly where this page stops.
When Neither Model Is Right
Some tasks fail on Sonnet too: research-grade synthesis across contradictory sources, architecture decisions with long dependency chains, agent runs that must stay coherent for hours. That is Opus territory — Opus 4.8 at $5/$25 — and pretending Sonnet covers it produces the same silent-failure pattern as pretending Haiku covers planning. Genuinely Opus-shaped work is under 5% of fleet traffic in our experience; it is just the 5% where being wrong costs the most.
There is also a boring middle option: Sonnet 4.6 at $3/$15, still on the old tokenizer, still with extended thinking. A team mid-quarter on a calibrated eval baseline has a defensible reason to stay put until September rather than re-baseline during a promo window.
The one-line verdicts, for the tab you keep open:
- Bounded extraction and classification: Haiku 4.5, batched where possible.
- Tool chains of five calls or fewer: Haiku 4.5 behind a verifier.
- Planning past eight turns, ambiguity, self-correction: Sonnet 5.
- Anything over 200K context: Sonnet 5, by physics.
- Interactive UX under a 2-second budget: Haiku 4.5.
- The 5% that keeps failing on Sonnet: Opus 4.8, routed deliberately.
Frequently Asked Questions
Is Claude Haiku 4.5 better than Sonnet 4.5?
For bounded agent work, mostly yes on value: Haiku 4.5 scores 73.3% on SWE-bench Verified — a few points behind Sonnet 4.5 — at one-third the price and 4-5x the speed, per Anthropic's launch post. Sonnet stays ahead on long-horizon planning and self-correction. Note that Sonnet 5 replaced 4.5 as the default Sonnet in June 2026.
How much cheaper is Claude Haiku than Sonnet per task?
After correcting for Sonnet 5's tokenizer (~1.3x more tokens for the same text), a multi-turn agent task costs about 2.6x less on Haiku during the intro window and about 3.9x less from September 1, 2026 — $0.13 versus $0.34-$0.50 on our worked 12-turn example. Haiku-first routing with escalation usually lands 40-60% below an all-Sonnet fleet.
What is Claude Sonnet 5's pricing and when does the introductory rate end?
$2 per million input tokens and $10 per million output through August 31, 2026; from September 1 the standard rate is $3/$15. Batch rates are half that, and the 1M-token context window bills at standard per-token pricing with no long-context premium.
Does Claude Haiku 4.5 support extended thinking and computer use?
Yes to both. Haiku 4.5 supports classic extended thinking with a manual token budget (not the adaptive-thinking system Sonnet 5 uses), plus computer use, tool use, prompt caching, and the Batch API. Its limits are structural: a 200K context window, 64K max output, and a reliable knowledge cutoff of February 2025.
When should an agent escalate from Haiku to Sonnet?
On a tripped verifier, a repeated tool call with identical arguments, or a turn count well past budget. Deterministic checks catch most failures for free; a cheap judge call covers the fuzzy remainder. If one shape's escalation rate keeps climbing, change its default to Sonnet.
References
- Anthropic — Introducing Claude Haiku 4.5. https://www.anthropic.com/news/claude-haiku-4-5
- Anthropic — Introducing Claude Sonnet 5. https://www.anthropic.com/news/claude-sonnet-5
- Anthropic — Claude Platform pricing documentation. https://platform.claude.com/docs/en/about-claude/pricing
- Anthropic — Models overview (specs, context windows, model IDs). https://platform.claude.com/docs/en/about-claude/models/overview
- AWS — Amazon Bedrock pricing. https://aws.amazon.com/bedrock/pricing/
- AWS Machine Learning Blog — Introducing Claude Sonnet 5 on AWS. https://aws.amazon.com/blogs/machine-learning/introducing-claude-sonnet-5-on-aws-anthropics-most-capable-sonnet-model/