TL;DR
Anthropic released Claude Opus 4.8 on May 28, 2026 as a direct upgrade to Opus 4.7. Same price ($5 per million input tokens, $25 per million output tokens), better benchmarks across coding, agentic skills, reasoning, and knowledge work, and a new fast mode that runs at 2.5x the speed at three times less the cost of previous fast modes. The headline behavioural change is honesty: Anthropic reports Opus 4.8 is around four times less likely than Opus 4.7 to let flaws in its own code pass unremarked.
For CX leaders, this is not a forklift upgrade. It is a quality-of-life release for the slice of traffic already routed to Opus, plus a few new primitives (dynamic workflows, effort control, mid-task system messages) that change how agentic stacks can be wired.
What Changed vs Opus 4.7
Five things are materially different.
Honesty as a benchmark. Anthropic specifically trained and measured for honesty in 4.8. The model is more likely to flag uncertainty, push back on weak plans, and surface issues in inputs or outputs instead of papering over them. The headline number: around 4x fewer unflagged flaws in code it writes versus Opus 4.7. For CX, the analogue is a model that escalates ambiguity instead of hallucinating confident answers.
Same price, better model. API pricing is unchanged: $5 per million input tokens and $25 per million output tokens. Every CX team currently routing to Opus 4.7 gets a free quality bump by swapping the model ID to claude-opus-4-8.
Fast mode is cheaper and faster. Opus 4.8 ships with a fast mode at 2.5x speed, priced at $10 input and $50 output per million tokens. Anthropic describes this as three times cheaper than the fast mode on previous Opus models. That changes the economics of synchronous chat with an Opus-class model.
Effort control is now a user-facing dial. Where Opus 4.7 introduced the xhigh tier in the API, Opus 4.8 exposes effort directly to end users on claude.ai and Cowork, and inside Claude Code via xhigh and max. High is the new default. The model uses roughly the same number of tokens as Opus 4.7 at default effort, but produces better answers.
Dynamic workflows in Claude Code. Available in research preview on Enterprise, Team, and Max plans. Claude can now plan a task and run hundreds of parallel subagents in one session, verify outputs, and report back. Anthropic gives codebase-scale migrations across hundreds of thousands of lines of code as the worked example. For CX, the relevant analogue is long-horizon agentic operations: end-of-day reconciliations, bulk refund sweeps, KB rewrites.
Bonus: mid-task system messages. The Messages API now accepts system entries inside the messages array. Agents can update permissions, token budgets, or environment context mid-run without breaking the prompt cache or routing through a user turn. This is a small surface change with real consequences for production agent harnesses.
Industry Reception
Anthropic published quotes from eleven early testers alongside the launch. The pattern is consistent.
Cursor reports Opus 4.8 exceeds prior Opus models across every effort level on CursorBench, with meaningfully more efficient tool calling (fewer steps for the same intelligence). Genspark says it is the only model to complete every case end-to-end on their Super-Agent benchmark, beating GPT-5.5 at parity on cost. Browserbase reports 84 percent on Online-Mind2Web, calling it the strongest computer-use and browser-agent model they have tested. Devin highlights that Opus 4.8 fixes the comment-verbosity and tool-calling regressions they saw with Opus 4.7. Databricks reports 61 percent cheaper token cost than Opus 4.7 for multimodal reasoning over PDFs and diagrams in their Genie agent.
Two themes run across the testimonials. First, judgement: testers describe a model that asks better questions, catches its own mistakes, and pushes back when a plan is unsound. Second, reliability on long-horizon agentic work, which is exactly where prior Opus generations had visible failure modes.
What This Means for CX Teams
The practical implications are narrower than the launch noise suggests.
Swap the model ID. If you are already routing escalated, long-context, or supervisor-layer traffic to Opus 4.7, change the model ID to claude-opus-4-8. Same price, better outputs. No architectural work.
Re-evaluate fast mode for synchronous chat. The 2.5x speed at three times lower cost than the prior fast tier makes Opus 4.8 fast mode plausible for live chat use cases where Opus 4.7 was previously too slow or too expensive. Run it against your current mid-tier model on the same case mix and measure.
Lean on the honesty gains for high-stakes flows. Refunds, account changes, billing disputes, and anything regulated benefit from a model that flags uncertainty rather than confidently guessing. Move those flows to Opus 4.8 at high or xhigh effort and watch your false-confident-answer rate.
Wait on dynamic workflows. They are powerful, but they are research preview and currently inside Claude Code. The CX analogue (hundreds of parallel subagents working through a long-horizon operation) is real, but worth piloting on internal back-office work before customer-facing flows.
Do not push Opus 4.8 down market. The 80 percent of tier-1 traffic that lives on flash-class models should stay there. Opus 4.8 is still 10 to 30x the cost per token of those models. The right architecture is unchanged: route each contact to the cheapest model capable of resolving it at the required quality bar.
AI Readiness Score
How ready is your team for AI?
6 quick questions. Get a personalised score and action plan.
Try the AI Readiness Score1000+ agents deployed worldwide · 4.8 on G2
How Opus 4.8 vs 4.7 Stacks Up at a Glance
Price per million tokens (input / output): identical at $5 / $25. Fast mode: new on 4.8 at $10 / $50, around three times cheaper than the prior generation. Honesty on code review (unflagged flaws): around 4x lower on 4.8. Default effort: high on 4.8, with extra and max for harder tasks. Computer use (Online-Mind2Web, Browserbase): 84 percent on 4.8, a meaningful jump over 4.7. Tool calling efficiency (CursorBench): better intelligence per step on 4.8. Multimodal cost on dense documents (Databricks Genie): 61 percent cheaper on 4.8.
The Honest Caveats
Two things to watch.
Anthropic itself frames 4.8 as a modest but tangible improvement, and signals that the next class of model (Mythos) is what to expect for a real capability jump. Do not over-rotate the roadmap on this release.
Higher effort tiers spend more tokens. The default effort on Opus 4.8 lands at roughly the same token spend as 4.7 default, but extra and max will move your bill. The right answer is the same routing discipline as always: pay the premium only when the case actually needs it.
What to Do This Week
Three concrete moves.
Flip your Opus traffic to claude-opus-4-8. Same price, better model. Measure autonomy, CSAT, and false-confident-answer rate over two weeks against the prior baseline.
Pilot fast mode on one synchronous use case. Pick a chat flow currently on a mid-tier model where reasoning quality matters. Run Opus 4.8 fast mode in parallel and compare cost, latency, and resolution rate.
Audit your high-stakes flows for honesty wins. Refunds, billing, account changes, regulated answers. Route them through Opus 4.8 at high or xhigh effort and track the rate of escalated-with-uncertainty versus confidently-wrong answers. The honesty gains should show up here first.
Anthropic has shipped a focused, well-priced upgrade. The discipline that makes it valuable in a CX operation is the same as it has always been: route the work to the cheapest model that can do it well, and pay the premium tier only when the case actually needs it.
Try it directly at claude.ai or via the Anthropic API. To see how Certainly routes contacts across Opus, GPT, and Gemini in production, book a demo.