Anthropic launches Claude 3.5 Sonnet, claims it beats Claude 3 Opus on reasoning, coding and vision benchmarks

The new mid-tier model outperforms the previous flagship on reasoning, coding, and vision benchmarks while being twice as fast and priced at the same rate as the earlier Sonnet.

Anthropic today released Claude 3.5 Sonnet, a new mid-tier model that the company says surpasses its own flagship Claude 3 Opus on a wide range of evaluations, including graduate-level reasoning (GPQA), undergraduate knowledge (MMLU), and coding (HumanEval). The model is priced at $3 per million input tokens and $15 per million output tokens — the same as the previous Claude 3 Sonnet — but operates at twice the speed of Opus.

The company also debuted Artifacts, a preview feature on Claude.ai that renders code, documents, and web designs in a side panel alongside the chat. Users can edit and iterate on Claude’s output in real time. Anthropic described the feature as the first step toward transforming Claude from a conversational AI into a collaborative workspace, with plans to support team collaboration and, eventually, centralized knowledge, documents, and ongoing work in one shared space.

On an internal agentic coding evaluation, Claude 3.5 Sonnet solved 64% of tasks, compared with Opus’s 38%. The model also sets new highs on vision benchmarks, particularly for chart and graph interpretation. Anthropic emphasized that the model remains at ASL-2 after red teaming and independent safety evaluation by the UK AI Safety Institute.

The record

Anthropic: Introducing Claude 3.5 Sonnet

One year later — open only if you can handle spoilers

Claude 3.5 Sonnet quickly became the preferred model for many developers, and Artifacts evolved into a broader platform feature that influenced later products like Claude Code. The model effectively commoditized frontier performance at mid-tier pricing, intensifying the pricing war among big labs.

Replay thisPost on X Reddit HN LinkedIn