Context Window
Also known as · context length · context
The maximum amount of text (in tokens) a model can consider at once.
The context window is the model's working memory — the total number of tokens it can take in and reason over in a single request, including both your prompt and its own response. If a conversation or document exceeds that limit, the oldest content has to be dropped or summarized.
Context windows have grown dramatically, from a few thousand tokens in early models to a million or more in current frontier models. A larger window lets you feed in entire codebases, long documents, or lengthy conversations without losing track.
Bigger isn't free, though. Processing more context costs more (you pay per token) and can slow responses down, and models don't always use the middle of a very long context as reliably as the beginning and end. Fitting the right information into the window — rather than just dumping everything in — is a real skill.