Blog

Engineering writeups, not marketing.

First-hand notes from production AI agent, industrial IoT, edge firmware, and regulated fintech work. Specific failure modes, real tradeoffs, and the decisions that survived contact with production.

Production AI agents·May 27, 2026

Why AI agents fail in production

Most agents that fail in production do not fail on intelligence. They fail in the seams: data integrity, attribution, idempotency, and regression control around a probabilistic component.

aillmagentsproduction

Systems design·May 27, 2026

InfluxDB vs TimescaleDB: choosing a time-series store after running both

The choice is less about raw ingest speed than about how your time-series data relates to everything else. If it joins to relational metadata, that drives the decision more than any benchmark.

timeseriesdatabasesinfluxdbtimescaledb

Production AI agents·May 27, 2026

When tool-calling beats RAG: agents over structured systems

RAG is the right tool for documents. Most production agents are pointed at a system with an API, and there typed tool-calling beats retrieval. Where embeddings and MCP actually fit.

aillmagentsrag

Systems design·May 27, 2026

Securing webhooks: signature verification and idempotency are most of the job

A webhook is an unauthenticated POST from the internet you are about to act on. Verify the sender signed it, make processing idempotent, and most webhook incidents never happen.

webhookssecurityfintechreliability

Industrial IoT / edge firmware·May 27, 2026

Modbus RTU vs TCP: what changes when you actually implement both

At the register level Modbus is one protocol with two transports. In the code that talks to real devices, the batch limits, framing, and failure modes differ enough that the same poller needs different handling for each.

modbusiotembeddedprotocols

Production AI agents·May 27, 2026

Structured output vs tool-calling: two jobs people keep conflating

Structured output constrains the shape of a one-shot answer. Tool-calling lets the model fetch data and take actions. Using one where the other fits is a common reason an agent is harder to build than it needs to be.

aillmagentsstructured-output

Systems design·May 27, 2026

Audit logging, three ways: a web app, a settlement system, and an AI agent

"Add an audit log" is one phrase that produces three different systems. What each captures, where it is enforced, and the tradeoff each one makes.

architectureauditfintechsecurity

Production AI agents·May 27, 2026

What an AI agent should remember, and for how long

An agent has at least four kinds of memory with different lifetimes and storage. Conflating them is how you get a system that forgets what it just did or drags a whole conversation into every model call.

aillmagentsmemory

Systems design·May 27, 2026

Idempotency in practice: webhooks, scheduled jobs, and agent tools

An operation is idempotent if running it twice has the same effect as once. Three places that matters in production, and how the mechanism differs in each.

architecturedistributed-systemsreliability

Industrial IoT / edge firmware·May 27, 2026

IEC 60870-5-104 in practice: ASDUs, IOAs, and the interrogation that hangs

The IEC 104 spec is precise and the reference material is thin, so most of what you need lives in the parsing code. Type IDs, normalized values, sequential addressing, and the firewall failure that costs a day.

iec104scadaprotocolsiot

Production AI agents·May 27, 2026

We truncated tool output to fit the context window. It silently corrupted the agent's answers.

Character-truncating a tool result keeps you under budget while destroying the data's meaning. Here is the field-level compression and four-level token budget we use instead.

aillmagentsmlops

Industrial IoT / edge firmware·May 27, 2026

A single lithium D-cell browns out your Cat.1 modem mid-transmit

Battery-powered cellular sensors fail on impedance, not capacity. Here is why a Cat.1 modem resets during transmission on an ER34615, and the pulse-buffer fix that holds the rail.

embeddediotfirmwarehardware

Regulated fintech·May 27, 2026

A nine-state settlement workflow and the type mismatch the compiler missed

Money workflows accrete states, and the worst bug in ours was a type mismatch static typing should have caught: one entity stored an identifier as a string, another as a number, and the join silently returned nothing.

fintechjavadistributed-systemsarchitecture

Industrial IoT / edge firmware·May 27, 2026

Designing a multi-protocol gateway for brownfield industrial sites

A single industrial site can speak Modbus, DL/T 698, and IEC 60870-5-104 at once, with an MQTT cloud on top. The gateway in the middle is where the real design work lives.

iotmodbusgatewaysembedded

Production AI agents·May 27, 2026

Self-healing agent output: deterministic checks first, a judge model second

Asking a model to re-check its own work on every response is slow and unreliable. Here is a two-pass repair design, and the revert rule that stops a fix from making things worse.

aillmagentsevals

Industrial control systems·May 27, 2026

Tracking battery state of charge when eight sensors disagree

A storage site has one state of charge per pack, reported independently, arriving out of order, and drifting with age. Turning that into one number the control loop can act on is where the bugs live.

control-systemsdistributed-systemsiotenergy