Temperature
Also known as · sampling temperature
A setting that controls how random vs. deterministic a model's output is.
Temperature controls how 'sharp' or 'flat' a model's choice of the next token is. At each step the model produces a probability distribution over possible next tokens; temperature scales that distribution. Near 0, the highest-probability token almost always wins — output is deterministic, focused, and repetitive. Higher values flatten the odds so less-likely tokens get a real chance — output becomes more varied and creative, but also less predictable.
The practical rule: use low temperature (around 0–0.3) for tasks that demand precision and consistency — code, data extraction, factual answers — and higher temperature (around 0.7–1.0) for brainstorming and creative writing. Many consumer chat products default to around 0.7.
Temperature is one of a few sampling controls, alongside top-p and top-k, that shape the randomness of generation.