LogitMaxAI Glossary › Distillation

Distillation

Also known as · knowledge distillation · model distillation

Training a smaller 'student' model to mimic a larger 'teacher' model.

Knowledge distillation trains a smaller, cheaper 'student' model to reproduce the behavior of a larger 'teacher' model. Instead of (or in addition to) learning from raw data, the student learns from the teacher's outputs, capturing much of its capability at a fraction of the size and cost.

It's why many fast, inexpensive models punch above their weight — they've been distilled from larger frontier models. For a lot of everyday tasks, a distilled model is more than good enough and far cheaper to run.

Distillation, quantization, and architectural efficiency together explain a broad industry trend: capability that required a giant model last year often runs on a much smaller one today.

Go Deeper

Beyond definitions.

LogitMax teaches the AI frontier in 30 short, plain-English modules — from tokens to agents to where it's all heading.

Start the course