Skip to content
← All field notes

Thinking

The agent that runs your operation can't be on call

Agentic AI is being adopted faster than it's being trusted, and for good reason. An agent that takes actions in your operation isn't a faster tool — it's a new owner of decisions nobody can be paged for. The work of making it ownable is the same work it has always been.

6 min readBy Javier Bates · Founder
A control interface mid-action, an automated step running without a person at the screen

The pitch has changed

For two years the AI pitch was about authorship. Build faster. Ship sooner. Let a model write the thing. We wrote about what that leaves behind — a system with no author, an integration nobody can interview, a record that's a probability instead of a log.

The pitch has moved on. It's no longer about software a model wrote. It's about software a model runs. An agent that reads the alert and decides whether to escalate it. An agent that reads the email and decides whether to issue the refund. An agent that watches the telemetry and decides whether to throttle the pump. Not a tool a person uses. A participant that acts.

This is the part of the AI conversation worth slowing down for, because it's being adopted far faster than it's being trusted — and the gap between those two things is where operational decay lives.

An agent that acts isn't a faster version of your last automation. It's a new owner of decisions in your operation. And the question we ask of every system applies to it harder than it has ever applied to anything: when this does something at three in the morning, who can be paged, and what can they actually do when they get there?

What an agent actually changes

A deterministic automation does what it was told. When it does the wrong thing, the wrong thing is reproducible. You can read the rule, find the branch, see why it fired, and change it. The automation has no judgment, which is exactly why it can be owned — its behaviour is a thing someone wrote down.

An agent has judgment, or something the demo calls judgment. It takes an input you didn't fully anticipate, reasons about it in a way that isn't stored anywhere, and takes an action you didn't explicitly authorise for that exact case. Most of the time it's right. That's the trap. A system that's right most of the time, acting on its own, accrues trust faster than it accrues oversight.

The hardest part of putting an agent into production isn't making it clever. Every team building these will tell you the same thing once the demo is over: the hard part is giving it safe, reliable access to the systems it needs to act on — and being able to account for what it did once it has. The intelligence is cheap now. The accountability isn't.

That's not an AI problem. It's the operations problem we've always described, wearing this year's clothes.

Failure mode one: the action with no approver

The first place this shows up is in the actions themselves.

Every consequential action in a mature operation has someone accountable for it. A refund has an approver. A valve change has an operator. A dispatch has a controller. The accountability isn't bureaucracy — it's the thing that lets you reconstruct, afterwards, why a decision was made and whether it should be made the same way again.

When an agent takes that action, the approver quietly disappears. Not by decision. By omission. The agent was given the capability to act so that it wouldn't have to wait for a person, and waiting for a person was the point of the person. Now the action happens, and the honest answer to "who approved this" is "the model did, on the basis of a prompt written by someone who has since moved on."

For most operations this is invisible on day one, because the agent is right. It matters on the day it's wrong in a way that costs something, and the question isn't what did it do — the logs have that — but who stood behind the decision to let it. If the answer is nobody, you don't have an automation problem. You have an ownership vacuum with permissions.

Failure mode two: the runbook that assumes a human was watching

The second place this shows up is in the response.

Your on-call runbook was written for deterministic failure. The pump alarm fires, the operator checks the reading, follows the steps, restores the state. The runbook works because the system's behaviour is legible and the human is the one exercising judgment.

An agent inverts that. The agent is exercising the judgment, and the human is responding to the consequences of it — often after the fact, often without the context that produced it. The page goes out at 3am: the agent rerouted the dispatch. The on-call engineer opens the runbook and there's no step for "understand why the model decided that," because the model's reasoning wasn't a thing that happened. It was an inference about a thing that happened. You can ask it why. The answer is another output, generated now, about a decision made then.

So the engineer is left with the two questions that actually matter at 3am and can't be answered from the screen: can I safely undo what it did, and can I stop it doing it again before morning. If the system was built to act but not to be reversed or paused by the person on call, the answer to both is no, and the only real control is to pull power and absorb the outage. That's not ownership. That's a kill switch standing in for one.

Failure mode three: the scope that quietly grows

The third place this shows up is in the permissions.

Agents are useful in proportion to what they're allowed to touch. So the pressure is always in one direction: give it one more tool, one more credential, one more system, because each grant makes it more capable and the last grant didn't cause a problem. The permission set starts scoped to one task and ends up scoped to the operation, one reasonable expansion at a time.

This is the oldest shape of decay we know — the integration that quietly becomes load-bearing — except the thing accumulating reach now makes its own decisions about how to use it. Nobody mapped the full set of things the agent can do, because nobody granted them in one sitting. They accreted. And the day someone finally asks "what is this allowed to do to our production systems," the honest answer takes a week to assemble, because the scope was never a contract. It was a series of conveniences.

The same problem, three times

The three failure modes look different. They're the same problem.

Each one is a consequential decision moved out of a place where someone owned it and into a system that acts without anyone owning the action, the response, or the reach. None of that is new in kind. We've watched operations decay this way for years. What's changed is that the thing doing the deciding is now autonomous, fast, and trusted before it's accountable.

The teams doing this well are not the ones who refused to use agents. They're the ones who treated the agent as a participant in the operation and gave it what every participant in a mature operation has: a bounded scope, a reversible action, and a human who can be paged for what it decides. They put deterministic boundaries around the non-deterministic actor. They made every consequential action reversible or approvable. They wrote down what the agent is allowed to touch, and turned it back into a contract. They built the kill switch and then did the harder work that makes the kill switch the last resort instead of the only one.

That work is not what the agent demos look like. It's what the operation looks like three years on, when the agent is still running and the people who deployed it have moved on, and the team that inherited it can still answer for what it does.

The question we ask

We ask the same question of every system we walk into, and an agent doesn't change it. Would the people who depend on this be able to keep it running — safely, accountably, without us — in three years.

For an agent, that question splits into three plainer ones. Can you undo what it did. Can you find out why. Can someone be paged who is genuinely accountable for the decision to let it act, not just for restarting it.

If technology is the answer here — and often it is — an agent can be a good one. But an agent that can act in your operation and can't be owned by your operation isn't automation. It's a decision-maker you can't put on call. The work of making it ownable is the same work it has always been. The model didn't remove it. It just made it easier to skip.