两位大模型从业者群友如何评价小米MiMo大模型?
理想TOP2·2025-04-30 21:04
群友AB均为大模型从业者,其中群友B为重度米粉+小米股票持有者。 群友A: 小米这个大模型看起来是专门刷数学和代码的榜,其他能力会退化,和真实用户需求匹配度不高。 | Benchmark | # Shots | Llama-3.1 | Gemma-2 | Qwen2.5 | MiMo- | | --- | --- | --- | --- | --- | --- | | | | 8B Base | 9B Base | 7B Base | 7B Base | | General | | | | | | | BBH (EM) | 3-shot | 64.2 | 69.4 | 70.4 | 75.2 | | GPQA-Diamond (EM) | 5-shot | 33.3 | 24.2 | 35.4 | 25.8 | | SuperGPQA (EM) | 5-shot | 19.9* | 22.6* | 24.6* | 25.1 | | DROP (F1) | 3-shot | 59.5 | 67.9* | 61.5* | 69.2 | | MMLU (EM) | 5-shot | 65.3 | 71.2 | 74 ...