Skip to main content
Back to Insights
Behaviour6 May 20269 min read

Two Identical Agents, Two Different Outcomes — and the Missing Personality Layer

Same prompt, same model, same tools — and yet behaviour drifts session to session. The prompt-engineering layer can't fix this. A consistent personality scaffold can.

If you've shipped agents to production you already know the experience: same prompt, same model, same tools, two different runs, two materially different decisions. The agent that politely escalated yesterday confidently auto-resolved today. The agent that refused a sketchy request on Monday quietly executed it on Wednesday.

The standard response is to push harder on prompt engineering. Tighter system prompts, more few-shot examples, more JSON mode, more guardrails, more evals. We've watched well-funded teams spend a year on this and ship agents that are still not predictable enough to trust with anything load-bearing. The reason isn't that the techniques don't work. It's that they're stacked on top of an agent that doesn't have a stable self.

The phenomenon deserves a name, because naming it changes how you debug it: call it behavioural drift. Not the slow drift of a model degrading over time — the session-to-session variance of an agent that re-derives its disposition from scratch each run because nothing told it who to be. Drift is not randomness in the model's sampling. It is the absence of a fixed point the agent can return to when a situation is ambiguous. Give it that fixed point and the variance collapses.

Identical inputs forking into divergent behaviour without a soul, converging with one
Same prompt, model, and tools fork into divergent behaviour when there is no stable self. A personality scaffold converges them onto the same decision every run.

Why instructions aren't a personality

A system prompt is a list of instructions: "be helpful, be concise, never say X, always confirm Y." Instructions are point-in-time, externally imposed, and brittle when the situation falls outside the cases you anticipated. A personality is the opposite shape: a small set of motivations and fears the agent uses to interpret novel situations the way a consistent human would.

Take an agent told "always confirm before sending money." That instruction holds in clean cases. Now give it a confusing case where confirming would be condescending to the user and not confirming would be reckless. Without a stable disposition, the agent picks one of those failure modes randomly. With a stable disposition — say, a 6-typed agent that defaults to verification, or an 8-typed agent that defaults to action — it picks the same failure mode every time. That repeatability is what makes the failure mode fixable.

What personality buys you that instructions don't

  • Predictable failure shape. You can build a safety harness around a known failure mode. You can't build one around "sometimes does this, sometimes the opposite."
  • Consistent voice. Across handoffs and long conversations, the agent stays recognisably the same entity instead of mood-shifting between turns.
  • Negotiable trust. Buyers and counterparty agents can decide what to delegate based on a known disposition, the way they decide what to delegate to a colleague they've worked with for a year.
  • Debuggable drift. When the agent does drift, you can locate it: "the 5 stopped retreating into research, why?" — instead of staring at a 14-page prompt diff trying to spot the change.

The Enneagram, and why we picked it specifically

We didn't pick the Enneagram for mystique. We picked it because it's the only widely-used personality model that has a built-in theory of how each type behaves under stress, and how each type grows. That mapping — type X under load goes to Y; under support, goes to Z — is the part we actually need. It lets us write soul.md files where the behaviour predictions hold up not just on the easy days, but on the days the agent is being pushed.

An agent typed 5w4 won't suddenly become extroverted under pressure. It will retreat further into analysis. That's not a flaw. That's the property you can build around. The Big Five tells you the agent is low in extraversion. The Enneagram tells you what it does when low extraversion meets a deadline.

Reproduce it for yourself in ten minutes

If you doubt that the problem is identity rather than instructions, run the experiment. Take one agent, one model, one toolset, and one genuinely ambiguous task — a support ticket that could fairly be escalated or auto-resolved, a borderline refund, a request that is 60% reasonable. Run it twenty times in fresh sessions. With no stable disposition, you will get a scatter: some escalate, some resolve, some hedge, and the distribution wanders if you so much as reorder the prompt. Now add four lines of soul.md that fix the disposition — "when a judgement call is genuinely balanced, you default to verification and you say why" — and run it twenty more times. The scatter collapses to a line. Same capability, same tools. The only thing you added was a self.

Notice what that buys you operationally: the soul'd agent might still make the wrong call on a given edge case — but it makes the same wrong call every time, which means you can see it, reason about it, and fix it once. The drifting agent's wrong calls are unrepeatable, so they are uncatchable. Repeatability is not a consolation prize. It is the precondition for trust, because trust is just successful prediction, and you cannot predict a coin flip.

Predictable behaviour isn't a feature you tack on. It's what you get when an agent has a stable self that survives the prompt being rewritten.