Retrieval-Augmented Generation (RAG)

Also known as · RAG

Giving a model relevant external documents at query time so it answers from facts.

Retrieval-augmented generation is a technique for grounding a language model in specific, up-to-date, or private information it wasn't trained on. Instead of relying only on what's baked into its parameters, the system first retrieves relevant documents — usually via embeddings and a vector database — and inserts them into the prompt so the model answers from that material.

RAG solves two big problems at once: it reduces hallucination by giving the model real source text to work from, and it lets the model use information that's newer or more private than its training data, without the cost of retraining.

It's the standard pattern for building chatbots over your own documents — support knowledge bases, internal wikis, legal or financial archives. The quality of a RAG system depends heavily on the retrieval step: garbage context in, garbage answer out.

Learn more in Module 9 — RAG: Retrieval-Augmented Generation →

Retrieval-Augmented Generation (RAG)

Related terms

Beyond definitions.