Transformer

Also known as · transformer architecture

The neural-network architecture behind virtually all modern LLMs.

The transformer is the neural-network design that powers nearly every modern language model. Introduced in 2017, its key innovation was the attention mechanism, which lets the model weigh the relationships between all the tokens in its input at once rather than reading strictly left to right.

This parallelism is what made training on massive datasets practical — transformers use hardware like GPUs efficiently, which let models scale to the sizes we see today. The 'GPT' in ChatGPT stands for Generative Pre-trained Transformer.

Almost every well-known model — Claude, GPT, Gemini, Llama — is a transformer variant. Architectural research continues, but the transformer remains the dominant foundation.

Learn more in Module 4 — Transformers & Attention →

Transformer

Related terms

Beyond definitions.