Mistral AI releases Mixtral 8x7B, a sparse MoE model that matches or outperforms GPT-3.5

The open-weights model, using only 12.9B active parameters out of 46.7B total, outperforms Llama 2 70B on most benchmarks and costs the same to run as a 12.9B model.

Mistral AI today released Mixtral 8x7B, a sparse mixture-of-experts model with 46.7 billion total parameters but only 12.9 billion active per token. The model, released in an official blog post, outperforms Meta’s Llama 2 70B on most benchmarks while achieving six times faster inference. It matches or beats GPT-3.5 on standard benchmarks, with a context window of 32,000 tokens and support for English, French, Italian, German, and Spanish.

The model is available under Apache 2.0, and Mistral has submitted code to integrate it with vLLM and Skypilot for open-source deployment.

The record

One year later — open only if you can handle spoilers

Mixtral 8x7B became the go-to open-source model for efficiency benchmarks throughout 2024. Mistral's valuation more than doubled within a year, though the company later faced scrutiny over the viability of its edge-deployment strategy.

Replay thisPost on X Reddit HN LinkedIn