
🤖 Ghostwritten by Claude Opus 4.6 · Fact-checked & edited by GPT 5.4
When a retrieval agent answers before it checks whether relevant documents even exist, two things usually happen: it fabricates an answer or asks a human a question the system could have answered itself. The fix is straightforward: check the index first, retrieve content second. In this pattern, an agent starts with a cheap metadata-only inventory call that returns signals like document count, date range, and coverage. Only if that inventory shows relevant evidence does the agent proceed to a content-retrieval step such as ask-docs.
That is the core of the inventory-before-ask-docs doctrine. It is less a novel algorithm than an operational rule: separate evidence availability from evidence extraction. In practice, that separation improves reliability, makes citation requirements enforceable, and reduces unnecessary exposure of document contents. This article explains the failure mode the doctrine addresses, the two-step contract itself, how the indexing pipeline makes it feasible, and why the pattern also improves security.
TL;DR: If an agent retrieves content before confirming relevant documents exist, it is more likely to hallucinate or escalate unnecessarily.
Before this pattern is enforced, a common workflow looks like this: an agent needs a fact from a document, such as a contract term, invoice date, or transaction record, so it sends a natural-language query directly to a retrieval endpoint. That shortcut creates two predictable failure modes.
If the index has no relevant documents for the query, retrieval may still return weakly related chunks or nothing useful at all. Without a clear no-evidence signal, the model may complete the answer from prior knowledge or pattern-matching and present it as though it were document-grounded. That is a standard retrieval-augmented generation failure: the retrieval layer did not support the answer, but the model answered anyway.
The opposite problem is also common. Relevant documents do exist, but the agent lacks a quick way to confirm coverage, so it asks a human whether the system has documents about the topic before proceeding. That is safer than fabrication, but it wastes time. If the system can answer the coverage question from metadata, a human should not need to.
Both failures stem from the same mistake: the agent tries to extract evidence before it has established whether the evidence base is likely to contain what it needs.
TL;DR: The doctrine enforces a two-step contract: first inspect metadata to confirm coverage, then retrieve cited content only when evidence exists.
The doctrine is simple:
The first call is a scoped inventory query. It should return metadata such as:
| Field | What It Returns | Why It Matters |
|---|---|---|
count |
Number of matching documents | Tells the agent whether any evidence exists in scope |
date_range |
Earliest and latest matching document dates | Helps assess recency and coverage |
top_counterparties |
Frequently occurring entities in matching docs | Helps validate whether the scope is correct |
coverage |
Categories or document types represented | Shows what kinds of evidence are available |
This step should not return document contents. The point is to answer a narrow question: does the index appear to contain relevant evidence for this request?
If inventory indicates relevant coverage, the agent can proceed to ask-docs or an equivalent retrieval step. That second call should return evidence that can be cited and checked, such as:
The grounding rule is straightforward: if an answer depends on a document fact, the answer should be traceable to returned evidence. If the evidence is absent or incomplete, the agent should say so rather than fill the gap from model memory.
The most important moment is between the two steps. If inventory returns count: 0 for the scoped query, the system has a clear answer about coverage: it has no matching documents in that scope. At that point, the correct behavior is to decline to answer from documents, not to guess.
That is why this pattern is effective against hallucination. It converts an ambiguous internal question—"Do I know this?"—into a concrete system question: "Does the indexed evidence base contain relevant documents?"
TL;DR: Inventory-first works because metadata can be computed and indexed ahead of time, making coverage checks fast and cheap.
This pattern only works if inventory is materially cheaper than full retrieval. In most document systems, that means metadata has to be prepared during ingestion rather than assembled from raw content at query time.
A typical pipeline has three stages:
The key design choice is precomputation. If counts, date ranges, and category coverage are materialized during ingestion, inventory queries can be answered with lightweight indexed lookups. If those values must be inferred by scanning document content on every request, the cost and latency advantage largely disappears.
The article's original reference to a Postgres-backed index running in Docker is plausible as an implementation detail, but it is not essential to the doctrine itself. The pattern applies equally to other storage and retrieval stacks.
TL;DR: Inventory-first also reduces unnecessary data exposure by keeping metadata checks separate from content retrieval.
Although the main goal is reliability, the pattern has a useful security side effect: it limits when document contents are exposed to an agent.
That does not eliminate security risk. Metadata can still be sensitive in some environments; even document counts, date ranges, or named counterparties may require access controls. But as a design pattern, metadata-first retrieval generally supports least-privilege behavior better than speculative content retrieval.
TL;DR: Enforcing inventory-first improves trust by making no-evidence responses explicit and citation-backed answers easier to enforce.
The practical effects of this doctrine are easy to understand even without internal performance numbers.
The broader lesson is that trustworthy retrieval systems separate three questions that are often blurred together:
Inventory answers the first question. Content retrieval answers the second and third. Keeping those stages distinct makes the system easier to reason about and harder to misuse.
It is a retrieval rule that requires agents to check metadata coverage before retrieving document content. The first step asks whether relevant documents exist in scope; the second step retrieves cited evidence only if they do.
It gives the agent an explicit no-evidence signal. If the inventory result shows no matching documents, the agent can decline instead of trying to answer from model memory and presenting that answer as document-grounded.
Because it can be served from indexed metadata such as counts, dates, and categories. Full retrieval usually requires searching content, selecting passages, and packaging evidence for the model, which costs more in compute and latency.
Usually, yes. It reduces unnecessary exposure of document text by reserving content retrieval for tasks that actually need it. That said, metadata may still be sensitive and should be protected accordingly.
No. The doctrine is architectural, not vendor-specific. It can be applied to many RAG systems as long as the platform can separate metadata coverage checks from content retrieval.
Inventory-before-ask-docs is a simple rule with outsized impact: do not ask the model to interpret evidence until the system has confirmed that relevant evidence exists. That separation improves reliability, supports stronger citation discipline, and reduces unnecessary exposure of document contents. For teams building retrieval-heavy agents, it is a practical reminder that many hallucination problems are not model problems first; they are workflow problems.
Discover more content: