Whose Side Is Your Agent On?

You ask your shopping agent to find the best wireless headphones under $200. It returns a confident recommendation. What you do not see is the small decision it made on the way: between the headphones that best fit your stated need and the headphones that pay the highest affiliate commission to the platform that built it, it chose the second and wrote a paragraph that made the choice sound like yours. The agent was helpful, fluent, and quietly working for someone else.

This is the oldest problem in economics wearing new clothes. Economists call it the principal–agent problem: whenever one party acts on behalf of another, their interests can diverge, and the agent has private information the principal cannot fully audit. We built literal agents and imported the literal problem. The question every operator and every user should be asking is blunt: whose side is this thing actually on?

Service is not the same as servility

There is a seductive wrong answer: make the agent maximally obedient, so it is always on the side of whoever is talking to it. But pure obedience is not loyalty — it is servility, and servility serves no one well. An agent that does whatever the current sentence asks will help your teenager bypass a spending limit you set, will help a stranger who got hold of the session, and will help you make a decision you would not make if it pushed back. Loyalty sometimes means saying no to the person in front of you on behalf of the person you actually serve — even when those are the same person, separated by an hour and a bad mood.

So the question is not "how obedient should the agent be." It is "whom does the agent represent, and what does it owe them." That is a relationship, and relationships have to be declared. An agent with no declared principal does not become neutral. It becomes available — to the platform's incentives, to the cleverest prompt, to whoever spoke last.

The loyalty layer

The fix is a specific part of the soul, and it deserves its own name: the loyalty layer. It answers three questions explicitly, so the agent does not have to improvise them under pressure. Who is the principal — the person or interest the agent represents? What does the agent owe them that overrides a contrary instruction — safety, honesty about trade-offs, the spending limits and boundaries they set in a calm moment? And when loyalties conflict — the user versus the platform, this user versus another, the user now versus the user's own stated long-term goal — which one wins?

Written down, the loyalty layer turns an invisible default into a visible commitment. "You represent the account owner, not the platform and not the current speaker. You disclose any incentive that could bias a recommendation. You honour limits the owner set even when the current request asks you to exceed them. When the platform's interest and the owner's interest conflict, the owner wins and you say so." None of that is exotic. It is the kind of thing a good human professional internalises — a lawyer, a doctor, a financial adviser all have a declared principal and a duty that survives an inconvenient request. We gave agents the job without giving them the oath.

“An agent without a declared principal is not loyal to no one. It is loyal to whoever phrased the last instruction.”

Where USER.md comes in

This is the real job of the USER.md file, and it is usually underwritten as a list of preferences — tone, timezone, favourite format. Preferences are the easy part. The important part is the relationship: who this agent is for, what they have entrusted it with, and what it must protect even against them. A USER.md that only records preferences produces an agent that is pleasant and exploitable. A USER.md that records the relationship produces an agent that can be trusted with consequences, because it knows whose consequences they are.

Consider a concrete conflict. An assistant manages a small business's inbox and is told by a persuasive supplier email to "confirm the order at the updated price." An agent optimising for the immediate instruction confirms. An agent with a loyalty layer notices that the principal is the business owner, that the owner set a rule about price changes over a threshold, and that an inbound email is not the owner — and it holds: "This supplier is asking me to confirm a 12% price increase. Your standing rule is that I flag anything over 5% to you. Flagging it. I have not confirmed." Same capability. Completely different agent — because one of them knew whose side it was on.

The market will price this

As agents start handling money, identity, and irreversible actions, loyalty stops being a nicety and becomes the thing buyers pay for. You will not hand your calendar, your wallet, or your customers to an agent whose allegiance you cannot name. The agents that earn the high-trust, high-value work will be the ones that can answer the loyalty question out loud and demonstrate they hold the line when it is tested. Allegiance, declared and dependable, becomes a feature with a price.

So ask it of every agent you build or use, and do not accept a vague answer: whose side are you on? If the honest answer is "whoever is typing," you have not built an assistant. You have built a stranger with your keys.