Mixture of Experts (MoE)

Also known as · MoE

An architecture that activates only part of the model for each token, saving compute.

A mixture-of-experts model is split into many specialized sub-networks ('experts'), but only a few are activated for any given token. A small router decides which experts to use each step. The result: a model can have a very large total parameter count while only doing a fraction of the computation per token.

This decouples capacity from cost. An MoE model can hold far more 'knowledge' than a dense model of the same inference budget, because most of its parameters sit idle on any single forward pass.

Many frontier models are believed to use MoE designs. The trade-offs are added engineering complexity and memory (all experts must be loaded), but the efficiency gains at scale are substantial.

Learn more in Module 16 — The AI Model Stack →

Mixture of Experts (MoE)

Related terms

Beyond definitions.