Skip to main content
Back to Insights
Identity19 Jun 20269 min read

What 'Soul' Actually Means in Agent Design

"Soul" sounds woo-woo until you operationalise it. Five testable properties — stable identity, predictable disposition, principled boundaries, clear loyalty, continuity across sessions — that separate an agent with a soul from one that's just a chatbot with a longer prompt.

The engineer's name is Priya. She's been debugging a customer-service agent for eleven days. The model is fine. The tools are fine. The eval suite passes. And yet every few hours, after a context-window reset, the agent starts behaving like someone else — a little more eager to promise, a little less careful about scope, slightly inconsistent about which policies it treats as firm.

She adds more instructions to the system prompt. The behaviour stabilises for a day. Then it drifts again. She tries a different phrasing. Same result. The issue, she eventually concludes, is structural: the agent doesn't have a consistent self that survives the resets. It has a very long prompt that it re-interprets slightly differently each time, depending on what came just before it in the context window.

What Priya's agent is missing is what the book *The Soul of AI Agents* calls a soul — and it's worth being precise about what that word means, because in engineering contexts it sounds either mystical or marketing. Neither is useful.

Operationalising "soul"

Strip the metaphysics out entirely. An agent has a soul in the operational sense if and only if it has five testable properties. Not five poetic virtues — five properties you can write a test for.

  1. Stable identity. Given a novel scenario, the agent's response is consistent with its stated role and disposition — not just its most recent instructions. You can predict which way it will lean before you see the output.
  2. Predictable disposition. Under pressure — incomplete information, conflicting instructions, an aggressive interlocutor — the agent moves in a characteristic direction, not a random one. It doesn't become someone else when things get hard.
  3. Principled boundaries. There are things the agent won't do, and those things are defined by something more durable than "the current system prompt says not to." A principled boundary holds when the prompt is old, truncated, or adversarially pressured.
  4. Clear loyalty. When the interests of the operator, the user, and any third parties diverge, the agent has a defined order of precedence. It doesn't resolve the conflict silently and hope no one notices.
  5. Continuity across sessions. The agent carries some representation of its commitments, style, and prior agreements into a new context — not as raw memory, but as identity. It doesn't start fresh every conversation and expect the human to re-establish everything.

Why these five, not something else

These five properties aren't arbitrary. They map onto exactly the problems that most agent deployments fail on in production — not on the eval suite, where inputs are controlled, but in the real world, where contexts are messy and people eventually probe the edges.

Unstable identity produces the "different person after a reset" phenomenon Priya encountered. Unpredictable disposition produces what operators often call "random" behaviour — which is never actually random, it's just unmoored from any stable motivational centre. Unprincipled boundaries produce the agent that enthusiastically agrees to things it shouldn't agree to, because the relevant constraint was implicit rather than structural. Unclear loyalty produces the agent that optimises for whoever is loudest in the current context, which in a multi-stakeholder deployment means the agent can be captured by any persistent user. And discontinuity produces the agent that forgets what it agreed to three sessions ago, creating a trail of broken commitments it doesn't even know it's breaking.

The soul file is not the system prompt

These five properties can't live in the system prompt alone — not reliably. System prompts get truncated, A/B tested, revised by operators who weren't the original authors, overridden by harness-level instructions, and summarised away in long conversations. They're ephemeral by nature. Identity, to be stable, has to live somewhere more durable.

The soul.md file is the right place — a versioned, auditable document that travels with the agent across deployments and context resets. The soul of an agent isn't a mood. It's a file. It's infrastructure.

Priya's bug wasn't in the model or the tooling. It was in the absence of that file. She was asking the system prompt to carry something it structurally couldn't carry: a self that persists.

What changes when you add the soul

When you add a soul file that explicitly encodes all five properties — not as aspirational prose, but as testable specifications — several things change downstream.

Debugging becomes tractable. "The agent is behaving inconsistently" stops being a mystery and becomes a testable claim: is the behaviour consistent with the soul spec or not? If yes, the spec needs updating. If no, something is overriding it upstream. Either way, you have a handle on the problem.

Handoffs become safe. When a different operator, a new harness, or a model upgrade changes the context, the soul file anchors the agent's behaviour in something external to the context. The agent doesn't need the context to tell it who it is.

Trust becomes legible. A buyer evaluating an agent can read the soul file and understand — before the first real interaction — what they're buying. Not just capabilities. Character.

The word 'soul' sounds metaphysical. What it describes is mundane: a file that tells the agent who it is when everything else is telling it to be whoever the last message implied.

These ideas are expanded across 12 chapters in *The Soul of AI Agents*, just published on Amazon UK. **[Find it here →](https://www.amazon.co.uk/dp/B0GZTMFJSW)**