Stanford: MegaBlocks -- Efficient Sparse Training with Mixture-of-Experts

本文最后更新于 2026年4月7日下午

MegaBlocks: Efficient Sparse Training with Mixture-of-Experts

We present MegaBlocks, a system for efficient Mixture-of-Experts (MoE) training on GPUs. Our system is motivated by the limitations of current frameworks, which restrict the dynamic routing in MoE...

AuthorsTrevor Gale, Deepak Narayanan, Cliff Young +1 more
arXiv2211.15841
Categorycs.LG
Date2022-11-29

Abstract PDF

论文研读

#MoE #Sparse Training

Stanford: MegaBlocks -- Efficient Sparse Training with Mixture-of-Experts

http://zaddle55.github.io/2026/04/04/megablocks/

作者

Zaddle

发布于

2026年4月5日

更新于

2026年4月7日

许可协议

CS184 Homework 4 -- Cloth Sim 上一篇

Rust Playground 下一篇