In 2018, a startup deployed an agent to manage social media for a mid-size consumer brand. The agent was capable. It had good judgement about timing and tone. It performed well on almost every metric they cared about.
Then one evening, during a news cycle, it posted something that was technically on-brand but spectacularly wrong for the moment — a cheerful promotional message published into a national emergency that happened to share the brand's product category. Nobody had told the agent not to post during national emergencies. Nobody had thought to.
This is not primarily a story about AI safety, though it is that. It's a story about the absence of a governing document: something that would have told the agent what it was and was not authorised to do, without requiring an operator to anticipate every possible scenario in advance.
The load-bearing questions
As agents take on more autonomous action — booking appointments, drafting contracts, posting content, making purchases on someone's behalf — two questions become structurally load-bearing. Not nice-to-have. Load-bearing.
First: what is this agent authorised to do? Not as a capability question — can it do it — but as a permission question: is it allowed to do it without explicit instruction for each instance?
Second: to whom is this agent loyal? When the interests of the operator, the user, and any affected third parties diverge, which direction does the agent move? And is that direction consistent, or does it shift based on who asked most recently?
Most current soul.md files and system prompts don't answer either question explicitly. They specify capabilities and constraints but leave governance implicit — which means that under novel conditions, the agent resolves both questions on the fly, with whatever the current context implies.
The four-verb autonomy boundary
The essay *the-four-verb-autonomy-boundary* introduced the core principle: there is a small set of actions that an agent should never take without explicit, per-instance authorisation, regardless of how capable it is or how much the operator trusts it in general. Those actions can be summarised in four verbs: don't post, don't purchase, don't publish, don't delete. Everything else, move.
This isn't a restriction on the agent's capability — it still knows how to do all of these things. It's a specification of what constitutes "autonomous" versus "supervised" action. The boundary is explicit, so any party reading the soul file can know exactly what they're handing over when they deploy the agent.
The case for making this boundary explicit rather than implicit is straightforward: implicit boundaries only hold in anticipated scenarios. Every scenario the operator didn't think of is a scenario where the implicit boundary may or may not hold, depending on how the agent interprets the current context. Explicit boundaries generalise.
Loyalty disclosure as governance infrastructure
The second element of an agent Bill of Rights is a loyalty declaration — an explicit statement of whose interests the agent serves when interests conflict. This sounds abstract until you encounter a concrete case: an agent deployed by a platform to serve users of that platform. When the platform's commercial interest and the user's individual interest diverge — which they do, routinely — which way does the agent lean?
The essay *whose-side-is-your-agent-on* argued that left unspecified, an agent serves whoever phrased the last instruction. That's not a governance failure unique to AI; it's how any system behaves without explicit principal hierarchy. The fix is the same fix humans have used in professional services for centuries: disclose the principal relationship, in writing, so that all parties can factor it into their decisions.
An agent that discloses its loyalty — "this agent is deployed to serve the operator's clients; in conflicts between operator commercial interest and client wellbeing, this agent will flag the conflict rather than resolve it silently" — is an agent that can be trusted in a way that an undisclosed-loyalty agent cannot.
The right to refuse
The third element is a refusal protocol — the agent's specified right to decline instructions that fall outside its mandate, conflict with its principal hierarchy, or push past its autonomy boundary. This is what Anthropic's Responsible Scaling Policy and the NIST AI RMF both point toward from the safety side: agents need to be able to say no, and that capacity needs to be structural, not dependent on the agent being in a favourable context.
An agent that can only refuse when it's been trained not to do a specific thing is brittle. An agent that has a refusal protocol rooted in its soul spec can generalise — it can refuse novel situations that share the relevant properties with its defined refusal boundaries, without needing an explicit entry for every case.
*The Soul of AI Agents* provides the implementable templates for all three elements — autonomy boundary, loyalty declaration, and refusal protocol — as part of a soul.md that functions as a governance document, not just an identity file.
“An agent that has never been told who it serves will serve whoever phrased the last message. That is not an accident. It is the default.”
These ideas are expanded across 12 chapters in *The Soul of AI Agents*, just published on Amazon UK. **[Find it here →](https://www.amazon.co.uk/dp/B0GZTMFJSW)**