one year on
Anthropic unveils Claude 2 with 100k-token context window, opens chat to public
The second-generation model can process a novel-length prompt and scores 76.5% on the bar exam multiple choice, as the startup opens a public beta chat experience in the U.S. and UK.
Anthropic today released Claude 2, the second generation of its text-generating AI model, featuring a headline-grabbing 100,000-token context window that lets it process roughly 75,000 words — the length of The Great Gatsby — in a single prompt. The model is available immediately in beta in the U.S. and UK through a new public-facing website, claude.ai, and via a paid API priced the same as Claude 1.3.
Claude 2 scores 76.5% on the multiple-choice section of the bar exam, up from 73% for its predecessor, and 71.2% on the Codex HumanEval Python coding test, versus 56% before. Anthropic says the model is “2x better at giving harmless responses” on an internal red-teaming evaluation, attributing the improvement to its constitutional AI approach. The model was trained on data up to early 2023 and can generate outputs of up to 4,000 tokens.
Head of go-to-market Sandy Banerjee told TechCrunch the model is a tweaked version of Claude 1.3 rather than a ground-up rebuild, and that “we monitor how they’re used, how we can improve performance, as well as capacity.” The 100,000-token context window remains the largest of any commercially available model. Claude 2 can theoretically support an even larger 200,000-token context window, but Anthropic does not plan to support it at launch.
The launch marks Anthropic’s first direct consumer play, bringing it into closer competition with OpenAI. Partners including Jasper and Sourcegraph are already piloting Claude 2. The startup has raised $1.45 billion to date and estimates it will need $5 billion over the next two years to create its envisioned chatbot.
Jasper VP of Engineering said Claude 2 goes head to head with other state-of-the-art models, with particular strength for long form low latency uses.
Sourcegraph CEO and Co-founder said Claude 2's large context window and strong reasoning help Cody assist developers more effectively.
One year later — open only if you can handle spoilers
Claude 2's 100k context window set a bar that competitors quickly chased, but within a year GPT-4 offered 128k tokens and Google Gemini launched with 1M. The public beta was a modest success; Anthropic's big breakout came later with Claude 3. The bar exam score became a standard marketing benchmark across the industry.