Workflow
写在英伟达业绩前、谷歌十年磨一剑
傅里叶的猫·2025-11-19 22:56

之所以进步这么大,是因为改进了预训练和后训练。 今天最火的事莫过于谷歌的Gemini3,之所以这么火是大家的评价都很好,而且在多个维度的 benchmark中表现都非常好: | Benchmark | Description | | Gemini 3 Pro | Gemini 2.5 Pro | Claude Sonnet 4.5 | GPT-5.1 | | --- | --- | --- | --- | --- | --- | --- | | Humanity's Last Exam | Academic reasoning | No tools | 37.5% | 21.6% | 13.7% | 26.5% | | | | With search and code execution | 45.8% | - | - | - | | ARC-AGI-2 | Visual reasoning puzzles | ARC Prize Verified | 31.1% | 4.9% | 13.6% | 17.6% | | GPQA Diamond | Scientific knowledge | No tools ...