one year on
Alibaba releases Qwen 2.5 family with up to 72B parameters at Apsara Conference
The open-weight model family, trained on up to 18 trillion tokens, includes specialized Coder and Math variants and claims performance rivaling larger models like Llama 3.1 405B.
At the Apsara Conference in Hangzhou today, Alibaba Cloud unveiled the Qwen 2.5 family of open-weight models, ranging from 0.5B to 72B parameters. The company says the models were pretrained on up to 18 trillion tokens, yielding an average performance improvement of more than 18% over Qwen2. The 72B variant, Qwen2.5-72B, rivals Llama 3.1 405B on several benchmarks, including MMLU (85+) and HumanEval (85+), despite being much smaller.
The release also includes specialized models: Qwen2.5-Coder, trained on 5.5 trillion tokens of code data, and Qwen2.5-Math, supporting Chinese and English with CoT, PoT, and tool-integrated reasoning. All models except the 3B and 72B are under Apache 2.0. Alibaba Cloud CEO Eddie Wu said that AI has moved faster in the last 22 months than at ‘any historical period,’ but cautioned that ‘we remain in the early stages of the AGI transformation.’
The company also open-sourced Qwen2-VL-72B and benchmarked Qwen-Plus, its API-based model, against GPT4-o, Claude-3.5-Sonnet, Llama-3.1-405B, and DeepSeek-V2.5, saying it significantly outcompetes DeepSeek-V2.5 and is competitive with Llama-3.1-405B while still underperforming GPT4-o and Claude-3.5-Sonnet in some aspects.
The release also adds 100 open-sourced Qwen 2.5 multimodal models and a text-to-video AI solution, according to Alibaba Cloud.
The record
Described the release as 'what might be the largest open-source release in history', highlighting performance gains of over 18% on average over Qwen2.
view the original post →Said AI has evolved faster in the last 22 months than at 'any historical period' but that we remain in 'the early stages of the AGI transformation'.
view the original post →One year later — open only if you can handle spoilers
Qwen 2.5 became the most-downloaded open-weight family on Hugging Face by late 2024, but DeepSeek's V3 release in December 2024 shifted much of the open-source community's attention. The Coder and Math variants saw strong adoption in specialized tooling, though the 32B model's delayed availability frustrated some developers.