The default architecture for "chat with your data" is retrieval-augmented generation: embed the documents, embed the question, pull the nearest chunks into context, let the model answer. RAG is the right tool when your data really is documents. The trouble is that most production agents are pointed at a system with an API, and there RAG fights the grain of the data.

What RAG costs you on structured data

When the underlying source is a monitoring platform, a database, or any API with a schema, retrieval works against you. You flatten structured records into text to embed them, you fetch by fuzzy similarity instead of exact query, and you hope the chunks that came back contain the rows the question actually needed. A question like "which sites had an alarm in the last hour" has an exact answer the system can compute, and vector search can only approximate it. The deeper problem is the failure mode: the model cannot tell "the retrieval missed it" from "there is nothing to report," so a retrieval gap turns into a confident wrong answer.

Tool-calling matches the grain

The alternative is to give the model typed tools and let it call them. We wrap each capability of the underlying platform (list plants, fetch alarms for a plant, pull an inverter's time series) as a function with a typed signature, and the model calls them by name with arguments it chooses. The result comes back as validated structured data, not a similarity score. "Alarms in the last hour" becomes a parameterized call against the system that owns the answer, and the count is exact because the system computed it.

The failure mode improves too. A tool call either returns data or returns an error the agent can see and react to. There is no silent "the nearest chunk was close enough."

Where embeddings still earn their place

This is an argument about what to point vectors at, not an argument against them. We do run pgvector, but not over a document corpus. We use it to store the results of previous tool calls, so that in a long conversation the agent can recall what it already fetched by semantic similarity instead of re-calling the API. The thing being embedded is the agent's own working memory. (The memory design is its own writeup.)

Genuine documents (a protocol manual, a fault-code reference, an operating procedure) do belong behind retrieval. We expose those as an explicit search tool the model invokes when it decides it needs reference material, rather than auto-injecting retrieved chunks into every prompt. The model asks for documentation the same way it asks for data, which keeps context clean and retrieval intentional.

So where does MCP fit

MCP, the Model Context Protocol, often gets discussed as if it competed with RAG or tool-calling. It does not. It is a transport and packaging standard for exposing tools to a model in a portable way. Once you have decided on tool-calling, MCP is one way to ship those tools so different clients can use them without bespoke glue. The architectural decision (tools versus retrieval) comes first, and MCP is about how you distribute the tools after you have made it.

The rule of thumb

Point RAG at documents and tool-calling at systems. If the answer to a question is something a system can compute exactly, give the model the tool to ask, and reserve retrieval for the cases where the source really is unstructured text. A lot of agent projects that stall in production stalled because they reached for retrieval over a system that had an API the whole time.

Where AgentKick fits

We build production AI agent systems on this architecture: typed tool-calling against the systems that own the data, retrieval reserved for real documents, and embeddings used for the agent's own memory. If you are deciding between RAG, tool-calling, or a hybrid for an agent that has to be right, that is the work we do, usually as a fixed-scope AI Agent Production-Readiness Review into a phased build.

When tool-calling beats RAG: agents over structured systems