According to Tongyi's official microblog, Qwen2.5-Max was officially released on January 29. Qwen2.5-Max has demonstrated world-leading model performance in mainstream authoritative benchmarks such as knowledge (MMLU-Pro for testing university-level knowledge), programming (LiveCodeBench), comprehensive assessment of comprehensive capabilities (LiveBench), and human preference alignment (Arena-Hard). The Tongyi team evaluated the performance of the instruction model version and base model version of Qwen2.5-Max respectively. The instruction model is the model version that everyone can directly experience through dialogue. In benchmarks such as Arena-Hard, LiveBench, LiveCodeBench, GPQA-Diamond and MMLU-Pro, Qwen2.5-Max is on par with Claude-3.5-Sonnet, and almost completely surpasses GPT-4o, DeepSeek-V3 and Llama-3.1-405B.
Alibaba Qwen2.5-Max officially released, surpassing GPT-4o and DeepSeek-V3
2025-01-29 16:02:28
Email Subscription
Newsletters and emails are now available! Delivered on time, every weekday, to keep you up
to date with North American business news.
Weekly Highlights