Claude Haiku vs Sonnet is a routing decision, not a beauty contest. As of July 2026 the rates are: Claude Haiku 4.5 at $1 per million input tokens and $5 per million output, and Claude Sonnet 5 — released June 30, 2026 — at an introductory $2/$10 through August 31, 2026, rising to $3/$15 after that. The decision rule we run in production is short. Haiku is the default for bounded work: extraction, classification, tool chains of five calls or fewer. Sonnet earns its higher rate on exactly three things: long-horizon planning, ambiguous instructions, and tasks where the model must notice its own mistakes. Most agent fleets end up running both, with an explicit escalation rule between them.

"Which is better" has an easy answer — Sonnet, obviously; it costs three times as much for a reason. It is also the wrong question. Send everything to Sonnet and you overpay by 50-70% on the bounded majority of your traffic. Send everything to Haiku and you quietly ship wrong answers on the long-horizon minority. The interesting engineering lives at the boundary.

Scope note: this page decides which model a task shape deserves. The router itself — code, thresholds, per-tier cost tracking — lives in our guide to model routing and cost per task. Read this one first; implement with that one.

What Changed in the Haiku vs Sonnet Question This Summer

For most of the past year the comparison was Haiku 4.5 against Sonnet 4.5, then 4.6 — both Sonnets at $3/$15, a clean 3x sticker gap. Claude Sonnet 5 shipped June 30, 2026 on the Claude API and Amazon Bedrock the same day, and moved two goalposts at once. First, price: $2/$10 through August 31, 2026, then back to $3/$15. Second, the tokenizer: Sonnet 5's new tokenizer produces roughly 30% more tokens for the same text (Anthropic's published range: 1.0-1.35x by content), while Haiku 4.5 keeps the old one. A naive dollars-per-million comparison is therefore rigged — the two models do not count a million tokens the same way. Every figure below corrects for it.

Haiku 4.5 has not moved since October 2025: coding performance similar to Claude Sonnet 4 "at one-third the cost and more than twice the speed," per Anthropic's launch claim, 73.3% on SWE-bench Verified, still the cheapest and fastest current-generation Claude. Sonnet 5's positioning is equally blunt: performance close to Opus 4.8 at Sonnet prices, and — in AWS's launch words — a more reliable backbone for multi-step tool use.

Claude Haiku 4.5 vs Sonnet 5: The Spec Sheet, Date-Stamped

Every rate verified against Anthropic's pricing docs and the Amazon Bedrock pricing page on July 3, 2026. Reading this after August 31, 2026? The promo is over; use the standard column.

SpecClaude Haiku 4.5Claude Sonnet 5
Input / output per 1M tokens$1 / $5$2 / $10 to Aug 31, 2026; $3 / $15 from Sep 1
Batch API (50% off)$0.50 / $2.50$1 / $5 intro; $1.50 / $7.50 standard
Cache read (0.1x input)$0.10 per 1M$0.20 intro; $0.30 standard
Context window200K tokens1M tokens at standard per-token rates
Max output64K tokens128K tokens
Reasoning modeExtended thinking, manual budgetsAdaptive thinking, effort parameter (defaults high)
Latency classFastest in the familyFast
Reliable knowledge cutoffFeb 2025Jan 2026
TokenizerClaude 4-eraNew; ~30% more tokens per text
Bedrock model IDanthropic.claude-haiku-4-5-20251001-v1:0anthropic.claude-sonnet-5

Both support tool use, computer use, prompt caching, and the Batch API; on Bedrock both offer global and regional endpoints, regional at a 10% premium. The full rate table with Opus and every modifier lives in our Amazon Bedrock pricing guide. Three asymmetries above decide more routing arguments than any benchmark.

The context window is a hard wall

Haiku caps at 200K tokens. Sonnet 5 takes 1M at standard per-token rates. Anything holding a large repo or a long document set routes to Sonnet by physics, not preference.

The tokenizer discount that isn't

At the $2 intro rate, Sonnet 5 consumes ~1.3x the tokens on the same text, so the effective gap over Haiku is about 2.6x, not 2x. At the $3 standard rate: about 3.9x, not 3x. The promo makes Sonnet 5 feel closer to Haiku than it is — and September makes the gap wider than the Sonnet 4.6 era ever was.

Thinking modes are not interchangeable

Haiku 4.5 exposes classic extended thinking: you set a token budget, it spends up to that. Sonnet 5 uses adaptive thinking steered by an effort parameter that defaults to high on the API. Port a Haiku prompt across and output cost can jump past what the rate card predicts — check effort first. We got burned in week one: same eval suite, 41% more output tokens, all thinking.

Route by Task Shape, Not by Benchmark

Benchmarks measure a model alone in a room. Agent cost is the model times the task shape — turns, context growth, how recoverable a mistake is. Classify your traffic into shapes and most of the Claude Haiku vs Sonnet argument evaporates. It is the discipline that makes agentic AI costable at all.

Hand-drawn style diagram routing five agent task shapes to the right Claude model: bounded extraction, classification, and short tool chains route to Haiku 4.5 at 1 and 5 dollars per million tokens; multi-step planning and long-document work route to Sonnet 5; a dashed escalation arrow runs from Haiku to Sonnet for outputs that fail verification.
Hand-drawn style diagram routing five agent task shapes to the right Claude model: bounded extraction, classification, and short tool chains route to Haiku 4.5 at 1 and 5 dollars per million tokens; multi-step planning and long-document work route to Sonnet 5; a dashed escalation arrow runs from Haiku to Sonnet for outputs that fail verification.

Bounded extraction: Haiku, no debate

Pull fields from an invoice, normalize an address, parse a log line into JSON. Self-contained input, schema-checkable output, cheap-to-detect failures. This shape is 40-60% of traffic in most fleets we have reviewed. Paying Sonnet rates for it is burning money.

Classification and routing: Haiku, batched if you can

Intent detection, ticket triage, tool selection, judge calls in an eval harness. The first time we moved a triage agent from Sonnet to Haiku, the bill dropped by more than half and the confusion matrix did not move — triage is classification wearing a trench coat. Overnight classification at batch rates ($0.50 per million input) is close to free at most volumes.

Short tool chains: Haiku with guardrails

Three to five well-typed tool calls with a clear stopping condition: look up the order, check the policy, issue the credit, confirm. Anthropic shipped Haiku 4.5 explicitly as the sub-agent workhorse; its computer-use scores beat the old Sonnet 4. The guardrails matter, though: typed schemas, a turn ceiling, a loop detector.

Multi-step planning: Sonnet, and do not fight it

Eight-plus turns, dependencies between steps, a plan that must survive contact with intermediate results. This is what Sonnet 5 was built for — AWS's launch post leads with complex dependency chains and multi-step tool use. Haiku starts these tasks impressively and loses the thread quietly — too quietly to catch without instrumentation, which is what makes it expensive.

Coding: split the work, not the difference

Haiku 4.5 scores 73.3% on SWE-bench Verified — Sonnet 4-class coding at a third of the price, per Anthropic's framing. Bounded edits, review comments, test generation, lint-fix loops: Haiku. Multi-file refactors, architecture calls, anything holding a plan across a long session: Sonnet 5. Routing coding by scope beats picking one model, every time we have measured it.

The Cost-Per-Task Math

Sticker rates mislead because agents re-send the growing conversation every turn: input compounds, output mostly does not. Here is the same pair of tasks on both models, in Haiku-tokenizer counts with Sonnet 5's counts inflated 1.3x, no caching. A 3-turn extraction task (system prompt plus tools ~3K tokens, two tool results folded in) comes to roughly 13,500 cumulative input tokens and 700 output; a 12-turn agent task (4K base, ~1K of new context per turn) to roughly 114,000 input and 3,000 output.

TaskHaiku 4.5Sonnet 5 intro (to Aug 31, 2026)Sonnet 5 standard (from Sep 1)
3-turn extraction$0.017$0.044 (2.6x)$0.066 (3.9x)
12-turn agent task$0.13$0.34 (2.6x)$0.50 (3.9x)
12-turn, Haiku-first, escalate the 20% that fail$0.23 per successful task

Hand-drawn style bar chart pricing the same 12-turn agent task on three rate cards: Claude Haiku 4.5 at 13 cents, Claude Sonnet 5 introductory pricing at 34 cents through August 31 2026, and Sonnet 5 standard pricing at 50 cents, with a note that Haiku-first routing with escalation on the 20 percent of failures lands at about 23 cents per successful task.
Hand-drawn style bar chart pricing the same 12-turn agent task on three rate cards: Claude Haiku 4.5 at 13 cents, Claude Sonnet 5 introductory pricing at 34 cents through August 31 2026, and Sonnet 5 standard pricing at 50 cents, with a note that Haiku-first routing with escalation on the 20 percent of failures lands at about 23 cents per successful task.

A 3-turn extraction task

$0.017 versus $0.044. At 200,000 extractions a month that is $3,400 versus $8,800 — and $13,200 once the promo ends. Nothing about extraction quality justifies the difference. The delta is pure routing profit.

A 12-turn agent task

$0.13 versus $0.34 versus $0.50. The ratio does not improve with task length — input compounding hits both models the same way. What changes with length is the stakes: at 12 turns, Haiku's failure rate on planning-heavy work stops being a rounding error.

Failure-adjusted cost: the number that matters

Say Haiku completes a given 12-turn shape 80% of the time — roughly what we measured on a mid-complexity ops workflow; measure yours with a real agent evaluation harness. Route Haiku-first, escalate detected failures to Sonnet at standard rates: $0.13 + 0.2 x $0.50 = $0.23 per successful task, 54% below all-Sonnet. At 100,000 tasks a month, roughly $27,000 saved. The catch is the word "detected." Pure token math favors Haiku-first at almost any failure rate; what flips the verdict is whether failures are cheap to catch. If they are not, every miss ships to a user. No rate card prices that.

Two footnotes. Caching does not change the verdict — cache reads cost 0.1x input on both models, so the ratio holds; the TTL and breakpoint mechanics live in our prompt caching guide. And through August 31, batched Sonnet 5 carries the same sticker rate as realtime Haiku ($1/$5) — a two-month window worth exploiting for background long-horizon jobs. How these levers stack on AWS is in our Claude on Bedrock cost breakdown.

Where Haiku 4.5 Degrades

Not vibes — the four failure modes we have caught in transcripts.

Plan retention past turn eight

On long tasks Haiku starts strong and drifts. The signature: around turn 8-12 it re-executes a completed step or silently drops a subtask. We watched a Haiku migration agent re-run a finished schema step at turn 9 — the plan had scrolled out of its working attention. Sonnet 5's launch positioning, maintaining plans across stages with fewer correction rounds, names exactly this gap.

Ambiguity gets resolved by guessing

Give Haiku an underspecified instruction and it picks the most literal reading and commits, fast. It rarely stops to ask. Fine on bounded tasks, even desirable. Where the wrong interpretation costs ten downstream steps, it is the root cause in a plurality of our failure reviews.

Tool errors pass silently

An empty search result, a 404 wrapped in a 200, a tool returning an error string instead of raising — Haiku tends to accept the result and keep moving. Sonnet is measurably more likely to notice, retry with adjusted arguments, or flag it. If your tools have messy failure semantics, Haiku needs deterministic checks between calls.

Self-correction is shallow

When Haiku's first approach fails, its retry is usually the same approach, rephrased. Diagnosing the failure and switching strategy is precisely what the higher rate buys. The cleanest single rule we can offer: if the task requires the model to recover from its own mistakes, do not send Haiku alone.

Latency Budgets: When Speed Decides the Argument

Sometimes price is a distraction and speed is the whole decision. Haiku 4.5 is the fastest model in the lineup — Anthropic measured 4-5x Sonnet 4.5's throughput at launch — and for interactive agents that outweighs the rate difference. A support agent answering in 2 seconds instead of 6 changes user behavior; a laggy code-completion sub-agent gets turned off.

The reverse also holds. Background pipelines have latency budgets measured in hours; there the Batch API's 50% discount dominates and the only question left is the failure-adjusted math above. Our shorthand: interactive and bounded, Haiku. Background and bounded, Haiku batched. Interactive and long-horizon, Sonnet — eat the latency. Background and long-horizon, Sonnet batched.

The Escalation Rule: Route by Verifiability

The hard question is not "can Haiku do this task." It is "will I know when it didn't." Our procedure, in order:

  1. Classify the task shape at intake. Extraction, classification, short chain, planning, long-document. A one-line Haiku call can do this itself for a fraction of a cent.
  2. Check the hard constraints. Context over 200K, output over 64K, or knowledge past February 2025 without retrieval: Sonnet, no further analysis.
  3. Default by shape. Bounded shapes to Haiku, planning shapes to Sonnet, coding split by scope.
  4. Wrap every Haiku output in a verifier. Schema validation and deterministic checks first — they are free. For fuzzier correctness, a cheap LLM-as-a-judge call against a binary rubric; the judge can be Haiku too.
  5. Escalate on any tripped signal. Verifier failure, a repeated tool call with identical arguments (the loop signature), or a turn count 50% over budget. Re-run on Sonnet and log it — a rising escalation rate on one shape means your intake classifier is mislabeling it.

That is the policy. The implementation — router code, threshold tuning, escalation telemetry — is the whole subject of our Claude model routing guide, which picks up exactly where this page stops.

When Neither Model Is Right

Some tasks fail on Sonnet too: research-grade synthesis across contradictory sources, architecture decisions with long dependency chains, agent runs that must stay coherent for hours. That is Opus territory — Opus 4.8 at $5/$25 — and pretending Sonnet covers it produces the same silent-failure pattern as pretending Haiku covers planning. Genuinely Opus-shaped work is under 5% of fleet traffic in our experience; it is just the 5% where being wrong costs the most.

There is also a boring middle option: Sonnet 4.6 at $3/$15, still on the old tokenizer, still with extended thinking. A team mid-quarter on a calibrated eval baseline has a defensible reason to stay put until September rather than re-baseline during a promo window.

The one-line verdicts, for the tab you keep open:

  • Bounded extraction and classification: Haiku 4.5, batched where possible.
  • Tool chains of five calls or fewer: Haiku 4.5 behind a verifier.
  • Planning past eight turns, ambiguity, self-correction: Sonnet 5.
  • Anything over 200K context: Sonnet 5, by physics.
  • Interactive UX under a 2-second budget: Haiku 4.5.
  • The 5% that keeps failing on Sonnet: Opus 4.8, routed deliberately.

Frequently Asked Questions

Is Claude Haiku 4.5 better than Sonnet 4.5?

For bounded agent work, mostly yes on value: Haiku 4.5 scores 73.3% on SWE-bench Verified — a few points behind Sonnet 4.5 — at one-third the price and 4-5x the speed, per Anthropic's launch post. Sonnet stays ahead on long-horizon planning and self-correction. Note that Sonnet 5 replaced 4.5 as the default Sonnet in June 2026.

How much cheaper is Claude Haiku than Sonnet per task?

After correcting for Sonnet 5's tokenizer (~1.3x more tokens for the same text), a multi-turn agent task costs about 2.6x less on Haiku during the intro window and about 3.9x less from September 1, 2026 — $0.13 versus $0.34-$0.50 on our worked 12-turn example. Haiku-first routing with escalation usually lands 40-60% below an all-Sonnet fleet.

What is Claude Sonnet 5's pricing and when does the introductory rate end?

$2 per million input tokens and $10 per million output through August 31, 2026; from September 1 the standard rate is $3/$15. Batch rates are half that, and the 1M-token context window bills at standard per-token pricing with no long-context premium.

Does Claude Haiku 4.5 support extended thinking and computer use?

Yes to both. Haiku 4.5 supports classic extended thinking with a manual token budget (not the adaptive-thinking system Sonnet 5 uses), plus computer use, tool use, prompt caching, and the Batch API. Its limits are structural: a 200K context window, 64K max output, and a reliable knowledge cutoff of February 2025.

When should an agent escalate from Haiku to Sonnet?

On a tripped verifier, a repeated tool call with identical arguments, or a turn count well past budget. Deterministic checks catch most failures for free; a cheap judge call covers the fuzzy remainder. If one shape's escalation rate keeps climbing, change its default to Sonnet.

References

  1. Anthropic — Introducing Claude Haiku 4.5. https://www.anthropic.com/news/claude-haiku-4-5
  2. Anthropic — Introducing Claude Sonnet 5. https://www.anthropic.com/news/claude-sonnet-5
  3. Anthropic — Claude Platform pricing documentation. https://platform.claude.com/docs/en/about-claude/pricing
  4. Anthropic — Models overview (specs, context windows, model IDs). https://platform.claude.com/docs/en/about-claude/models/overview
  5. AWS — Amazon Bedrock pricing. https://aws.amazon.com/bedrock/pricing/
  6. AWS Machine Learning Blog — Introducing Claude Sonnet 5 on AWS. https://aws.amazon.com/blogs/machine-learning/introducing-claude-sonnet-5-on-aws-anthropics-most-capable-sonnet-model/