Back to Writing

RAG Is Not Search

January 9, 2024

Most RAG implementations are built like search engines with an LLM stapled on top. This works until it doesn’t, and when it doesn’t, the failures are spectacular.

The Core Confusion

Search is about finding documents. RAG is about synthesizing answers. These are different tasks with different evaluation criteria.

When a search engine returns irrelevant results, users scroll past them. When a RAG system retrieves irrelevant chunks and feeds them to a language model, the model will cheerfully hallucinate an answer that sounds authoritative while being completely wrong.

The retrieval step in RAG isn’t searching—it’s curating context. The quality bar is different. You’re not asking “does this document contain relevant keywords?” You’re asking “will this text help an LLM generate an accurate, grounded response?”

Where Implementations Fail

Chunking without semantic boundaries. Splitting documents at arbitrary token counts fragments meaning. A paragraph split mid-sentence doesn’t just lose information—it creates confusing context that degrades generation quality.

Ignoring recency and provenance. Vector similarity doesn’t know that a 2019 policy document was superseded by a 2023 update. Hybrid retrieval with metadata filtering isn’t optional for serious applications.

Treating retrieval as a black box. If you can’t inspect which chunks influenced a response, you can’t debug failures or build user trust. Citation and attribution aren’t nice-to-haves—they’re table stakes for editorial use cases.

What Good Looks Like

The RAG systems I build treat retrieval as a first-class concern with its own evaluation metrics:

  • Chunk quality scores based on semantic coherence, not just embedding similarity
  • Provenance tracking that follows context from retrieval through generation
  • Fallback strategies when confidence is low—including “I don’t have enough information to answer this”

That last one is the hard part. Teaching a system to refuse confidently is more difficult than teaching it to answer. But for newsrooms and editorial teams, it’s the difference between a useful tool and a liability.


This is the kind of technical nuance that matters in production. If you’re building RAG for a context where accuracy is non-negotiable, let’s talk.