LogitMaxAI Glossary › Inference

Inference

Also known as · serving · generation

Running a trained model to generate output — what happens on every prompt.

Inference is the act of using a trained model: you send a prompt, the model runs its math, and it generates a response token by token. Every API call, every chat message, every code completion is an inference.

Unlike training, inference doesn't change the model — the parameters stay fixed. But inference is where the ongoing cost lives. At scale, companies spend far more on inference (serving millions of requests) than on the one-time training run, which is why so much engineering goes into making inference faster and cheaper.

Techniques like quantization, caching, and specialized hardware all exist to drive down the cost and latency of inference without hurting quality too much.

Go Deeper

Beyond definitions.

LogitMax teaches the AI frontier in 30 short, plain-English modules — from tokens to agents to where it's all heading.

Start the course