Your LLM Doesn’t Know You

Here’s How to Teach It with RAG

LLMs are great at language but bad at facts. They hallucinate, forget details, and by design, can’t access your private database or yesterday’s update.

The fix? Retrieval-Augmented Generation (RAG). Instead of expecting the model to “know everything,” we give it the ability to look things up in real time.

At its core, RAG = retriever + generator. A retriever pulls the most relevant chunks of your data, and the LLM generates an answer based on them.

By the end of this article, you’ll have a solid mental model of RAG and a minimal example you can extend into production.

What is RAG?

Retrieval-Augmented Generation is a design pattern where an LLM’s input prompt is augmented with external context retrieved from a knowledge base.

So here you don't expect the model to know everything you just provide it with the knowledge and ask it to use its brain for answers.

RAG vs. fine-tuning vs. prompting

Plain prompting: Good for reasoning and creativity but limited to model’s training data.
Fine-tuning: Great when your task is narrow and stable, but retraining is costly.
RAG: Best when you need freshness, private data, or traceability—without touching model weights.

Key components:

Retriever – finds relevant context from the knowledge base u provide.
Index – stores data in a searchable way ( Vectors in this case ) .
Embeddings – convert text into vectors for semantic search.
Generator – the LLM that crafts the answer.

Why RAG is used

Reduce hallucinations – Ground answers in real context, your context.
Access private or dynamic data – Like company docs, APIs, or live feeds.
Lower cost and faster iteration – No need to fine-tune for every change.
Citations and control – Show sources, add filters, and audit outputs.\

How RAG works:
Indexing/Storing + Retrieval/Searching

When if have large amount of dataset for something which is time consuming to go through or remember but you want a quick and easy way for retrieving the context this is where RAG shines

First we need a way to store the data here comes the indexing step

When you have a large dataset, it’s not practical (or even possible) to dump it all into an LLM. Instead, we structure it so the model can look things up when needed. This happens in three main steps:

Chunking
- Big documents are split into smaller, manageable pieces.
- Example: a 100-page PDF becomes 500-token chunks with slight overlaps.
- Why? Because retrieval works better when units are small and precise.
Embeddings
- Each chunk is converted into a high-dimensional vector (a numerical representation of meaning).
- This lets us search semantically — “refund policy” will match “money-back guarantee,” even if the exact words differ.
- You can read more about them at vector embeddings part in : https://peerlist.io/prasadware/articles/rookie-understanding-of-ai
Indexing
- These vectors are stored in a special database (vector index) that supports fast similarity search.
- Tools: FAISS, Pinecone, Weaviate, pgvector, Qdrant and even mongo now stores vectors

Now, when a user asks a question, the system retrieves only the most relevant chunks from this index, and those chunks are passed into the LLM’s prompt.

The Generation Step

At query time, the pipeline looks like this:

User asks a question (e.g., “What’s the refund policy?”).
Retriever pulls the top-k most relevant chunks from the indexed data.
The LLM receives both:
- The user’s question.
- The retrieved context.
It generates an answer grounded in the provided context, often with citations.

This is the magic of RAG: the model doesn’t have to “know” your data in its weights, it just needs the ability to reason over whatever context you supply

Conclusion

LLMs are powerful, but by themselves, they’re blind to your private or recent data. RAG bridges this gap. It combines a retriever (to find the right context) with a generator (to answer the question), letting models “know” your data without retraining.

Think of it like giving the model a library card: it doesn’t memorize every book, but it knows how to quickly find the right page and explain it back to you.

If you want your AI systems to be factual, fresh, and trustworthy, RAG is the pattern you’ll end up using.

Join Prasad on Peerlist!

Join amazing folks like Prasad and thousands of other builders on Peerlist.