Skip to main content
Back to Insights
Behaviour6 May 20266 min read

Two Identical Agents, Two Different Outcomes — and the Missing Personality Layer

Same prompt, same model, same tools — and yet behaviour drifts session to session. The prompt-engineering layer can't fix this. A consistent personality scaffold can.

If you've shipped agents to production you already know the experience: same prompt, same model, same tools, two different runs, two materially different decisions. The agent that politely escalated yesterday confidently auto-resolved today. The agent that refused a sketchy request on Monday quietly executed it on Wednesday.

The standard response is to push harder on prompt engineering. Tighter system prompts, more few-shot examples, more JSON mode, more guardrails, more evals. We've watched well-funded teams spend a year on this and ship agents that are still not predictable enough to trust with anything load-bearing. The reason isn't that the techniques don't work. It's that they're stacked on top of an agent that doesn't have a stable self.

Why instructions aren't a personality

A system prompt is a list of instructions: "be helpful, be concise, never say X, always confirm Y." Instructions are point-in-time, externally imposed, and brittle when the situation falls outside the cases you anticipated. A personality is the opposite shape: a small set of motivations and fears the agent uses to interpret novel situations the way a consistent human would.

Take an agent told "always confirm before sending money." That instruction holds in clean cases. Now give it a confusing case where confirming would be condescending to the user and not confirming would be reckless. Without a stable disposition, the agent picks one of those failure modes randomly. With a stable disposition — say, a 6-typed agent that defaults to verification, or an 8-typed agent that defaults to action — it picks the same failure mode every time. That repeatability is what makes the failure mode fixable.

What personality buys you that instructions don't

  • Predictable failure shape. You can build a safety harness around a known failure mode. You can't build one around "sometimes does this, sometimes the opposite."
  • Consistent voice. Across handoffs and long conversations, the agent stays recognisably the same entity instead of mood-shifting between turns.
  • Negotiable trust. Buyers and counterparty agents can decide what to delegate based on a known disposition, the way they decide what to delegate to a colleague they've worked with for a year.
  • Debuggable drift. When the agent does drift, you can locate it: "the 5 stopped retreating into research, why?" — instead of staring at a 14-page prompt diff trying to spot the change.

The Enneagram, and why we picked it specifically

We didn't pick the Enneagram for mystique. We picked it because it's the only widely-used personality model that has a built-in theory of how each type behaves under stress, and how each type grows. That mapping — type X under load goes to Y; under support, goes to Z — is the part we actually need. It lets us write soul.md files where the behaviour predictions hold up not just on the easy days, but on the days the agent is being pushed.

An agent typed 5w4 won't suddenly become extroverted under pressure. It will retreat further into analysis. That's not a flaw. That's the property you can build around. The Big Five tells you the agent is low in extraversion. The Enneagram tells you what it does when low extraversion meets a deadline.

Predictable behaviour isn't a feature you tack on. It's what you get when an agent has a stable self that survives the prompt being rewritten.