Idempotency Patterns for AI Agent Actions: Why Exactly-Once Execution Is an Illusion

The Retry Problem Nobody Talks About

Every distributed systems engineer knows the two generals problem. Every database engineer knows about write-ahead logs. But most AI agent builders -- even experienced ones -- ship agents that can charge a customer twice, send duplicate emails, or create duplicate records, because they treat tool calls like pure functions in a single-process program.

They are not. An agent tool call is a distributed transaction with at least three participants: the LLM inference endpoint, the agent orchestrator, and the external system the tool interacts with. Any of these can fail independently. The network between them can partition. Timeouts can fire before completion signals arrive. And the LLM itself can decide, mid-conversation, to retry a tool call it already issued because the response was slow.

The result is a class of bugs that are invisible in development and catastrophic in production. The payment API returns a 200 but the agent's HTTP client times out before receiving it. The orchestrator retries. The customer gets charged twice. The agent reports success on the second attempt, unaware the first one also succeeded.

This is not a hypothetical. Every team running agents with real-world tool access hits this within weeks of production deployment. The question is whether you designed for it or whether you discover it in your billing disputes queue.

Why Exactly-Once Is Impossible

Let us be precise about what we mean. In distributed systems, message delivery has three semantic levels:

At-most-once: send and forget. The tool call might not execute. Acceptable for logging, not for business operations.

At-least-once: retry until acknowledged. The tool call will execute, but might execute multiple times. This is what most agent frameworks provide by default.

Exactly-once: the tool call executes precisely one time. This is what everyone wants. It is also impossible to guarantee in any distributed system without tight coupling between all participants.

The impossibility is not theoretical hand-waving. It follows from the fundamental nature of network communication: you cannot distinguish between "the message was lost" and "the acknowledgment was lost." When your agent calls a payment API and gets a timeout, it cannot know whether the payment was processed. Retrying risks duplication. Not retrying risks loss. There is no third option at the network level.

The practical solution is not to achieve exactly-once delivery but to make at-least-once delivery safe through idempotency. If every tool call can be safely re-executed without changing the outcome, then retries become harmless. The system converges to the correct state regardless of how many times any individual call is attempted.

This is the same insight that powers production tool use patterns at scale. The engineering discipline around tool calls is what separates demo agents from production agents.

Idempotency Key Patterns for Agent Tool Calls

The foundation of agent idempotency is the idempotency key -- a unique identifier for each logical operation that allows the receiving system to recognize and deduplicate retries.

Hash-Based Keys

The simplest pattern: hash the tool name, arguments, and a conversation-scoped nonce to produce a deterministic key. If the agent retries the exact same call, it produces the exact same key, and the receiving system returns the cached result instead of re-executing.

This works well for deterministic tools where the same arguments always mean the same intent. A database query with identical parameters is the same query. A payment with the same amount, recipient, and reference is the same payment.

The failure mode is semantic ambiguity. "Send a message to Alice saying hello" called twice might be a retry (send once) or an intentional repeat (send twice). Hash-based keys cannot distinguish intent from repetition without additional context.

Semantic Deduplication

For tools where intent matters, semantic deduplication examines the conversation context around the tool call. If the agent issued the same tool call in the same conversational turn, it is almost certainly a retry. If it issued the same call in a new turn after user input, it might be intentional.

The implementation uses a sliding window over the conversation history. Each tool call is compared against recent calls using both argument matching and conversation position. A call to send_email(to="alice", body="hello") at turn 5 is deduplicated against the same call at turn 5 (retry) but not against the same call at turn 8 (new intent after intervening conversation).

Operation Logs

The most robust pattern maintains a persistent log of every tool call, its arguments, its result, and its completion status. Before executing any tool call, the agent checks the operation log. If a matching call exists and completed successfully, the cached result is returned. If a matching call exists but failed, it is retried with the original idempotency key. If no matching call exists, a new entry is created before execution begins.

The operation log pattern also solves the observability problem. When something goes wrong, you have a complete audit trail of what the agent attempted, what succeeded, what failed, and what was retried. This is the kind of operational metric that matters -- not model accuracy scores, but system reliability under real-world conditions.

Side-Effect Classification: Idempotent vs Non-Idempotent Tools

Not all tools are created equal. The idempotency strategy depends on the side-effect profile of each tool.

Naturally idempotent tools can be safely retried without any special handling. HTTP PUT (setting a value), database upserts (insert or update), and read operations are idempotent by nature. Calling them twice produces the same state as calling them once.

Non-idempotent tools change state in ways that compound with repetition. HTTP POST (creating a new resource), payment charges, email sends, and counter increments are non-idempotent. Calling them twice produces different state than calling them once.

Conditionally idempotent tools depend on the arguments. A database write is idempotent if it uses an upsert with a client-provided ID. It is non-idempotent if it uses auto-generated IDs. A message send is idempotent if the messaging system supports deduplication keys. It is non-idempotent if it does not.

The agent's tool registry should classify every tool by its side-effect profile. Non-idempotent tools get mandatory idempotency key injection. Naturally idempotent tools get logging but no deduplication. Conditionally idempotent tools get runtime analysis based on the specific arguments.

This classification is the kind of engineering rigor that the era of AI engineering demands. We are past the point where agent building is a prompt engineering exercise. It is systems engineering.

The Idempotency Registry Pattern

The individual patterns above combine into a centralized idempotency registry -- a service that sits between the agent orchestrator and all external tool calls.

The registry maintains a key-value store where keys are idempotency identifiers and values are operation records containing the tool name, arguments, status (pending/completed/failed), result, and timestamp. Every outbound tool call passes through the registry.

The flow: Agent issues tool call. Orchestrator generates idempotency key. Registry checks for existing key. If found and completed, return cached result. If found and pending (another execution is in-flight), wait or return conflict. If not found, create pending record, execute tool, update record with result.

This pattern centralizes retry safety across all tools and all agents. It also provides a natural integration point for rate limiting (how many calls to this tool per minute), circuit breaking (stop calling a tool that is consistently failing), and cost tracking (how much are tool calls costing across all agents).

The registry should be backed by a database with strong consistency guarantees for the check-and-set operation. Redis with Lua scripting, PostgreSQL with advisory locks, or DynamoDB with conditional writes all work. The consistency of the registry is the foundation that everything else builds on.

Compensation Patterns: The Saga Approach for Agents

Idempotency prevents duplicate execution. But what about partial execution? An agent workflow that calls three tools in sequence -- reserve inventory, charge payment, send confirmation -- can fail between any two steps. If the payment succeeds but the confirmation fails, you need to decide: retry the confirmation, or reverse the payment?

The saga pattern, borrowed from microservices architecture, provides the framework. Each tool call in a multi-step workflow has a corresponding compensation action. If step N fails, the saga executor runs compensations for steps N-1 through 1 in reverse order.

For agent systems, this means every non-idempotent tool call in a multi-step workflow needs a registered compensator. The payment tool's compensator is a refund. The inventory reservation tool's compensator is a release. The email send tool has no compensator -- you cannot unsend an email -- so it should be the last step in any saga, executed only after all reversible steps have succeeded.

The orchestrator maintains a saga log alongside the operation log. Each saga tracks its steps, their completion status, and their compensation status. When a step fails, the orchestrator can make an informed decision: retry the failed step (if idempotent), compensate and abort, or escalate to a human operator.

This connects to how AI is transforming analysis workflows more broadly. The shift from simple request-response to complex multi-step agent workflows requires the same architectural maturity that enterprise systems have developed over decades. We are not inventing new computer science. We are applying proven distributed systems patterns to a new execution model.

Implementation Checklist

If you are building agent systems with real-world tool access, here is the minimum viable idempotency stack:

Classify every tool by side-effect profile. Document which are naturally idempotent, which need key injection, and which need saga compensators.

Generate idempotency keys at the orchestrator level, not the tool level. The orchestrator has conversation context that individual tools lack.

Log every tool call with its key, arguments, status, and result. This is your audit trail and your deduplication database.

Implement check-and-set at the registry level with strong consistency. Eventual consistency in the idempotency layer defeats the purpose.

Design compensators for every non-idempotent tool in multi-step workflows. If a tool has no compensator, it must be the terminal step.

Test failure modes explicitly. Simulate timeouts, partial failures, and duplicate deliveries in your integration tests. The happy path always works. Production is not the happy path.

Monitor retry rates as a system health metric. Rising retry rates indicate infrastructure degradation before it becomes an outage.

The agents that survive production are not the ones with the best prompts. They are the ones with engineering discipline around every interaction with the outside world. Idempotency is not a nice-to-have. It is the minimum bar for agent systems that handle real transactions.