Engineering writeups, not marketing.
First-hand notes from production AI agent, industrial IoT, edge firmware, and regulated fintech work. Specific failure modes, real tradeoffs, and the decisions that survived contact with production.
Why AI agents fail in production
Most agents that fail in production do not fail on intelligence. They fail in the seams: data integrity, attribution, idempotency, and regression control around a probabilistic component.
InfluxDB vs TimescaleDB: choosing a time-series store after running both
The choice is less about raw ingest speed than about how your time-series data relates to everything else. If it joins to relational metadata, that drives the decision more than any benchmark.
When tool-calling beats RAG: agents over structured systems
RAG is the right tool for documents. Most production agents are pointed at a system with an API, and there typed tool-calling beats retrieval. Where embeddings and MCP actually fit.
Securing webhooks: signature verification and idempotency are most of the job
A webhook is an unauthenticated POST from the internet you are about to act on. Verify the sender signed it, make processing idempotent, and most webhook incidents never happen.
Modbus RTU vs TCP: what changes when you actually implement both
At the register level Modbus is one protocol with two transports. In the code that talks to real devices, the batch limits, framing, and failure modes differ enough that the same poller needs different handling for each.
Structured output vs tool-calling: two jobs people keep conflating
Structured output constrains the shape of a one-shot answer. Tool-calling lets the model fetch data and take actions. Using one where the other fits is a common reason an agent is harder to build than it needs to be.
Audit logging, three ways: a web app, a settlement system, and an AI agent
"Add an audit log" is one phrase that produces three different systems. What each captures, where it is enforced, and the tradeoff each one makes.
What an AI agent should remember, and for how long
An agent has at least four kinds of memory with different lifetimes and storage. Conflating them is how you get a system that forgets what it just did or drags a whole conversation into every model call.
Idempotency in practice: webhooks, scheduled jobs, and agent tools
An operation is idempotent if running it twice has the same effect as once. Three places that matters in production, and how the mechanism differs in each.
IEC 60870-5-104 in practice: ASDUs, IOAs, and the interrogation that hangs
The IEC 104 spec is precise and the reference material is thin, so most of what you need lives in the parsing code. Type IDs, normalized values, sequential addressing, and the firewall failure that costs a day.
We truncated tool output to fit the context window. It silently corrupted the agent's answers.
Character-truncating a tool result keeps you under budget while destroying the data's meaning. Here is the field-level compression and four-level token budget we use instead.
A single lithium D-cell browns out your Cat.1 modem mid-transmit
Battery-powered cellular sensors fail on impedance, not capacity. Here is why a Cat.1 modem resets during transmission on an ER34615, and the pulse-buffer fix that holds the rail.
A nine-state settlement workflow and the type mismatch the compiler missed
Money workflows accrete states, and the worst bug in ours was a type mismatch static typing should have caught: one entity stored an identifier as a string, another as a number, and the join silently returned nothing.
Designing a multi-protocol gateway for brownfield industrial sites
A single industrial site can speak Modbus, DL/T 698, and IEC 60870-5-104 at once, with an MQTT cloud on top. The gateway in the middle is where the real design work lives.
Self-healing agent output: deterministic checks first, a judge model second
Asking a model to re-check its own work on every response is slow and unreliable. Here is a two-pass repair design, and the revert rule that stops a fix from making things worse.
Tracking battery state of charge when eight sensors disagree
A storage site has one state of charge per pack, reported independently, arriving out of order, and drifting with age. Turning that into one number the control loop can act on is where the bugs live.