博士宿舍激情脑暴,革新了Scaling Law?Qwen和浙大联手推出新定律,直接干掉95.5%推理内存!
AI前线·2025-05-21 18:04
整理 | 华卫 提升大语言模型(LLM)的智能水平,通常有两条主流的 Scaling Law 路线。一是扩展参数,用更多 模型参数来更细致地学习,这种方法非常吃显存;二是扩展推理思考的时间,增大思维链长度,这种 方法非常吃时间且依赖于训练数据、训练策略(RL),只适用于部分场景。 | Method | Inference Time | Inference Space | Training Cost | Specialized Strategy | | --- | --- | --- | --- | --- | | Dense Scaling | Moderate | 20 High | Pre-training only | (= No | | MoE Scaling | Low | 60 High | Pre-training only | 69 Load balancing | | Inference-Time Scaling | 6. High | (= Moderate | Post-training | 0 RL / reward data | | Parallel Scaling | (=) Mo ...