LogitMaxAI Glossary › Mixture of Experts (MoE)

Mixture of Experts (MoE)

Also known as · MoE

An architecture that activates only part of the model for each token, saving compute.

A mixture-of-experts model is split into many specialized sub-networks ('experts'), but only a few are activated for any given token. A small router decides which experts to use each step. The result: a model can have a very large total parameter count while only doing a fraction of the computation per token.

This decouples capacity from cost. An MoE model can hold far more 'knowledge' than a dense model of the same inference budget, because most of its parameters sit idle on any single forward pass.

Many frontier models are believed to use MoE designs. The trade-offs are added engineering complexity and memory (all experts must be loaded), but the efficiency gains at scale are substantial.

Go Deeper

Beyond definitions.

LogitMax teaches the AI frontier in 30 short, plain-English modules — from tokens to agents to where it's all heading.

Start the course