SWE-rebench 最新榜单:中国 AI 模型占前十四席,GLM-5 排名第三

Gate News 消息,3 月 25 日,SWE-rebench 基准测试维护者 Ibragim 于 3 月 23 日公布榜单更新。SWE-rebench 是一个每月从 GitHub 抽取全新软件工程任务的实时基准测试,模型无法提前针对题目优化。本次更新取消了此前的示例演示和 80 步操作限制,新增辅助评估任务。

最新前十排名:1. Claude Opus 4.6(65.3%);2. GPT-5.2 medium(64.4%);3. GLM-5(62.8%);4. GPT-5.4 medium(62.8%);5. Gemini 3.1 Pro Preview(62.3%);6. DeepSeek-V3.2(60.9%);7. Claude Sonnet 4.6(60.7%);8. Claude Sonnet 4.5(60.0%);9. Qwen3.5-397B-A17B(59.9%);10. Step-3.5-Flash(59.6%)。

智谱 AI 的开源模型 GLM-5(MIT 协议)以 62.8% 排名第三,是榜上最高的开源模型。中国模型占前十中四席,除 GLM-5 外,还有深度求索 DeepSeek-V3.2(第六)、阿里通义千问 Qwen3.5-397B-A17B(第九)以及阶跃星辰 Step-3.5-Flash(第十)。智谱 Z.ai 全球负责人李子玄表示,上一次 SWE-rebench 更新时中国模型全部落在前十之外。

Disclaimer: The information on this page may come from third parties and does not represent the views or opinions of Gate. The content displayed on this page is for reference only and does not constitute any financial, investment, or legal advice. Gate does not guarantee the accuracy or completeness of the information and shall not be liable for any losses arising from the use of this information. Virtual asset investments carry high risks and are subject to significant price volatility. You may lose all of your invested principal. Please fully understand the relevant risks and make prudent decisions based on your own financial situation and risk tolerance. For details, please refer to Disclaimer.
Opmerking
0/400
Geen opmerkingen