Tooling5 min read1 Jul 2026

RAG Isn't Dead. The Lazy Way You're Doing It Is

'RAG is dead' is the hot take of 2026, fueled by long context and agents. The truth: naive chunk-and-retrieve is dying. Retrieval itself is fine.

Written by northstar editorial·Updated 1 Jul 2026

"RAG is dead" became the contrarian hot take of 2026, and like most contrarian hot takes, it's half right in a way that makes it mostly wrong. The argument goes: long context windows are enormous now, agents can fetch their own information, so the whole apparatus of retrieval-augmented generation is obsolete.

The people saying this are reacting to something real. But they've misdiagnosed it. Naive RAG — the chunk-and-pray pipeline everyone copy-pasted in 2023 — is genuinely on its way out. Retrieval itself is not dead. It's growing up, and conflating the two is causing teams to either cling to a broken pattern or throw out a technique they still need.

The thing that's actually dying

Naive RAG had a recipe so standard it became a meme. Take your documents. Chop them into fixed-size chunks. Embed each chunk. When a query comes in, embed it too, grab the top-k most similar chunks by cosine similarity, paste them into the prompt, and hope the answer is in there.

This worked well enough in demos and poorly enough in production to keep an entire cottage industry of consultants employed. The failure modes were endless. Fixed chunking splits a concept across two chunks so neither is retrievable. Similarity search returns things that are semantically near but actually irrelevant. The right answer is in chunk eleven but you only pulled the top five. The query is phrased differently from the document so the embeddings don't match. The retrieved context is contradictory and the model picks the wrong part.

People are right that this is dying. It deserves to. It was always a brittle first draft that got mistaken for the finished technique.

Long context killed the easy cases — and that's good

The first thing eroding naive RAG is the sheer size of modern context windows. When your entire knowledge base fits comfortably in context, the whole retrieval apparatus is just overhead. You can hand the model everything and let it sort out what's relevant, which it's often better at than your similarity search was.

For small-to-medium corpora, this is a genuine simplification, and you should take it. If your knowledge fits in context, retrieving over it is premature optimization. Skip the vector database, skip the chunking strategy, skip the whole pipeline. Less machinery, fewer failure modes, better results. The "RAG is dead" crowd is correct that a lot of RAG deployments were solving a problem the user didn't have.

But "a lot" is not "all," and this is where the take overreaches. Long context is slow and expensive at the high end — stuffing a million tokens into every call is a real latency and cost hit. It degrades on genuinely large corpora that don't fit no matter how big the window gets. And it does nothing for knowledge that changes constantly, needs access control, or must be cited to a specific source. For those, you still need retrieval.

Agentic retrieval is where it's actually going

The more interesting evolution isn't "no retrieval" — it's smarter retrieval, where the model is in charge of the retrieval process instead of being a passive recipient of whatever the pipeline shoved at it.

Newsletter

Like this decoded read? Get the next one in your inbox.

One sharp take on the latest in AI for PMs, founders and operators. Every few days. Free.

Free forever. Unsubscribe anytime. No spam.

In naive RAG, retrieval happens once, blindly, before the model does anything. The model has no say in what gets fetched and no recourse if the fetch was bad. Agentic retrieval flips this. The model decides whether it even needs to retrieve, formulates its own queries, searches, looks at what came back, judges whether it's enough, and searches again with a refined query if not. It's a reasoning loop, not a one-shot pipeline.

This fixes the deepest flaw in naive RAG: the assumption that you can determine what's relevant before the model has started thinking about the problem. Often you can't. The relevant information only becomes clear once reasoning is underway, which is exactly when an agent can go fetch it. Most serious retrieval work in 2026 is moving this direction, and it's strictly more capable than the static pipeline it replaces.

Retrieval vs fine-tuning is still a confused debate

While we're killing bad takes: "RAG vs fine-tuning" is a false choice that keeps getting framed as a fork in the road. They solve different problems.

Retrieval is for knowledge — facts that change, that are large, that need citation and access control. You retrieve the current policy, the latest data, the specific customer's records. Fine-tuning is for behavior — how the model acts, the format it produces, the tone it takes, the domain conventions it follows. You fine-tune to make the model reliably output your structured format or adopt your house style.

The sophisticated systems use both. Fine-tune for behavior, retrieve for knowledge. Asking which one to "use instead" is like asking whether to use a steering wheel or an engine. The debate persists mostly because each is sold by people incentivized to position it as the whole answer.

The takeaway

"RAG is dead" is the kind of take that's catchy precisely because it's too simple. Naive chunk-and-retrieve is dying, and good riddance — it was a fragile first attempt that got canonized too early. Long context legitimately eliminated the easy cases, and you should embrace that simplicity wherever your corpus fits.

But retrieval as a discipline is healthier than ever. It's getting agentic, selective, and smart about when it's even needed. The right mental model in 2026 isn't "RAG or not." It's a toolkit: skip retrieval when long context suffices, use agentic retrieval when the corpus is large or fresh or controlled, and fine-tune for behavior on top of all of it. The teams winning aren't the ones who declared RAG dead or clung to the 2023 pipeline. They're the ones who matched the retrieval strategy to the actual problem.

Frequently asked questions

Is RAG dead in 2026? No. What's dying is naive RAG — chunk every document, embed it, retrieve the top-k by similarity, and stuff it into the prompt. That approach was always fragile. Retrieval as a concept is more relevant than ever; it's just getting smarter, more agentic, and more selective about when it's even needed.

Does long context make RAG unnecessary? For small-to-medium corpora that fit comfortably in context, sometimes yes — and you should take that simplicity when you can get it. But long context is slow, expensive, and degrades on very large or constantly changing knowledge bases. Retrieval still wins when the corpus is huge, fresh, or access-controlled.

What is agentic RAG? Agentic RAG is when the model decides whether, what, and how to retrieve, often across multiple steps — reformulating queries, searching iteratively, and judging whether it has enough information. It replaces the rigid single-shot retrieve-then-generate pipeline with a reasoning loop, and it's where most serious RAG work is heading.

When should I use RAG versus fine-tuning? Use retrieval for knowledge that changes, that's large, or that needs to be cited and access-controlled. Use fine-tuning for behavior, format, and tone — teaching the model how to act, not what facts to know. They solve different problems and are often used together, not as alternatives.

Want more sharp takes on AI tooling? Browse the AI Decoded archive.

Frequently asked

Is RAG dead in 2026?

No. What's dying is naive RAG — chunk every document, embed it, retrieve the top-k by similarity, and stuff it into the prompt. That approach was always fragile. Retrieval as a concept is more relevant than ever; it's just getting smarter, more agentic, and more selective about when it's even needed.

Does long context make RAG unnecessary?

For small-to-medium corpora that fit comfortably in context, sometimes yes — and you should take that simplicity when you can get it. But long context is slow, expensive, and degrades on very large or constantly changing knowledge bases. Retrieval still wins when the corpus is huge, fresh, or access-controlled.

What is agentic RAG?

Agentic RAG is when the model decides whether, what, and how to retrieve, often across multiple steps — reformulating queries, searching iteratively, and judging whether it has enough information. It replaces the rigid single-shot retrieve-then-generate pipeline with a reasoning loop, and it's where most serious RAG work is heading.

When should I use RAG versus fine-tuning?

Use retrieval for knowledge that changes, that's large, or that needs to be cited and access-controlled. Use fine-tuning for behavior, format, and tone — teaching the model how to act, not what facts to know. They solve different problems and are often used together, not as alternatives.