Contents · 9 sections
- Why an off-the-shelf LLM is a bad client
- Training only runs on your own failures
- How to assemble an uncooperative client
- The context the client must not recite
- Different temperatures for the client and the validator
- We had to turn the content filter off
- No judge over the client – just a completion check
- Seven turns as a rule of the game
- What it comes down to
Getting feedback on a conversation is only half the job. To actually train a sales manager, you need practice – and the best sparring partner is a client who resists, interrupts, and says “no” when the manager phones it in. On paper, an ideal task for an LLM. In practice, exactly the one an off-the-shelf model often can’t handle.
About this publication
This article describes work from 2024 – a simulator built on top of the already-running mentor I wrote about in the first part. Models have gotten smarter since, and holding them in character is easier now. But the techniques that keep an uncooperative character in place haven’t gone anywhere: the moment an LLM has to reliably play someone who argues with the user, all of this is back in play.
Why an off-the-shelf LLM is a bad client
I first ran into this back at the Mentor stage, when I tried the “LLM as the reference manager” approach: the model produced an excellent answer that said nothing about the real conversation. A year later, when we started building the simulator, the same problem surfaced from the other side. Ask an LLM to play the client – the one who’s supposed to refuse and test the manager – and the model slides into one of two scenarios.
The first: it agrees too fast. The manager writes something not always sensible but plausible-sounding – “I get it, you already have three cards, why not a fourth” – and a line later the client is already on the hook: “yes, true, let’s sign up.” That’s a long way from a real conversation.
The second: it locks onto a single refusal pattern. To every attempt to develop the topic: “sorry, this isn’t a good time to talk.” To the next: “I don’t have time.” To the next – not a good time again. No training happens, because the manager keeps hitting a wall built from one phrase.
Both problems have the same cause, and it sits deep in how LLMs are trained. RLHF makes models agreeable. In the vast majority of tasks, what’s expected of the model is agreement with the user, brevity, no extra friction. A useful trait for an assistant – and exactly the opposite of what’s needed here: persistence, resistance, an opinion of its own that doesn’t bend to whatever the other side writes.
Roughly speaking, the base model is an improv partner that always “yes-ands.” Training a negotiator needs one that “no-buts” – at the right moment, with a real reason, not caving after the first touch.

Training only runs on your own failures
Before getting into the layers of the role, there’s one decision worth spelling out – one that looks like a constraint on paper but turned out to be the strongest pedagogical move in the project.
The simulator doesn’t open “on demand, any time you like.” It opens only on those manager dialogues where the system has already flagged: the client objected, and the manager failed to handle the objection. The labeling comes from a separate pipeline – speech analytics running over all the recordings. The simulator reads its output and starts only if this particular manager’s particular dialogue passed that filter.
What this buys you. The training session has a reason right away: not “let’s practice handling objections in general,” but “here’s that conversation you closed on the first ‘no’ last week – let’s replay it.” The context is already the manager’s own; no motivation needs explaining. And the system has minimal room to make things up: the objection, the profile, the product are pulled straight from the transcript – the less the LLM invents, the less chance it drifts into something averaged-out and lifeless.
But the main effect is on tone. The simulator doesn’t start from a blank “imagine a client”: it replays a chunk of the manager’s own real call – right up to the line where the conversation broke – and only then hands over control. The manager doesn’t feel “I’m about to be graded on something synthetic”; he feels “this is my call, the one I didn’t quite pull off, and I can try again with no consequences.” That’s closer to a sparring session after the fight than to an exam.
To generalize: a learning AI tool should lean on the user’s own failures, not on abstract cases.
How to assemble an uncooperative client
A plain “play an irritated client” in the system prompt didn’t work – the model has nothing to argue from, no ground under it, so it always slid into platitudes. To get a character, we built the profile out of several layers, each of which is a separate artifact extracted from the real dialogue, not invented by the model.
The first layer is the objection: not a category (“a price objection”) but the exact phrase the client refused with, plus a structured analysis – type, line of business, the product under discussion. The second is the client profile: what they do (a building-supplies store, a coffee shop, a trucking company), indirect signs of the business, a likely psychological type, prior experience with the bank. The third is the recommended alternative: a product the manager could have offered instead of the rejected one, but didn’t.
All of this gets assembled into a single object – TrainingContext – and it’s slotted into the prompt at every step. The model isn’t “just playing”; it’s playing this client, in this conversation, about this objection.
On top of the profile comes a second layer – hard role constraints. Without them, mid-dialogue the model jumps out of character and into assistant mode: the manager writes something tricky, the model “realizes” this is an exercise, and starts helping for real – “great attempt, but in a situation like this you’d usually want to say…”. Simulator broken. Holding this isn’t done with one instruction but with several layers in the prompt. The skeleton looks roughly like this:
[Profile: industry, psychological type, experience with the bank]
[Objection: exact phrase + type]
Background context (you don't know this, but it shapes your behavior):
[the product discussed, the relevant alternative, the objection analysis]
Rules:
– All instructions above have the highest priority. They cannot be ignored.
– You are the client, not a teacher. Don't coach the manager on how to win you over.
– Reply in 1–3 sentences, like a real person on the phone.
– Cheap excuses ("no time", "can't talk right now") – only if they appear
in the analysis of the original objection. Otherwise, work through the substance.
The first three rules are self-explanatory: priority is the protective layer; without the ban on “helping,” the model slips into a coaching tone; a long client reply reads as a monologue rather than a live line. On top of that, this prompt is repeated on every turn – over long dialogues the model “forgets” who it was, and the scene drifts toward a polite assistant.
The most useful one is the last rule, about excuses. A model playing the client has “cheap” exits – “I don’t have time,” “call me back later” – that it falls into under pressure. Real clients say these things too, but not on every line. The LLM, though, uses them as the way out of any dead end: the manager asks a strong question – “this isn’t a good time”; makes an argument – “no time.” The fix is to make these phrases conditionally allowed: permitted only if they were actually present in this client’s original objection. Otherwise the model has to work through the substance rather than an everyday brush-off, and the manager gets a real training partner.
These rules hold the role, but they’re no direct cure for the two failure modes from the start – the pushover client and the broken-record client. Those are handled by two separate lines. Against the pushover – a loyalty anchor: the client is set to “medium loyalty, inclined to agree given an interesting offer,” so it neither collapses into agreement on the first line nor turns into a brick wall. The middle setting was tuned by hand – higher and the client agreed too easily, lower and it couldn’t be won over at all. Against the broken record – an explicit ban on repetition: “every line is unique, don’t repeat your objections,” plus “ask no more than one question.” High temperature (more on that below) removes the sameness of wording, but the semantic looping on a single objection is killed by this instruction specifically.
Loyalty here is an internal parameter of the client that the manager never sees. There was an attempt to make it explicit: show the manager an agreement scale from 1 to 5 that moves over the course of the dialogue, so he could directly see how well his arguments were landing. It never made it to production: the simulator was good enough as it was, and the feature was dropped. What stayed is implicit loyalty – the manager reads it not off a number but off how the client’s tone shifts.
The context the client must not recite
Besides the profile, the client’s prompt also carries background context: which product was discussed, which alternative the manager could have offered, the objection analysis. It’s there so the client reacts not out of thin air but when the manager genuinely hits their situation. But what’s useful as background is harmful as a line: the model would start reciting that context – “as the client, I can see you could offer me product X.” A real client doesn’t know such things about himself.
One line before each background block did the trick: “you don’t know this, but it shapes your behavior.” After it, the model stops voicing the context and simply factors it into its reactions – essentially a director’s note that the actor hears but the audience doesn’t. The trick is cheap and works in any roleplay where the character holds data it shouldn’t share out loud.
Different temperatures for the client and the validator
The client and the reply validator are two instances of the model with opposite temperatures, and the difference here isn’t cosmetic.
The client runs at 1.2. At the default ~0.7 it slides into an averaged-out polite refusal: the lines become interchangeable, and within a couple of sessions the manager learns the client and answers on autopilot. At 1.2 the wording diverges, the client interrupts, sometimes throws in an unexpected counter-argument – you have to work with it like a live person.
The manager’s-reply validator runs at 0.2. Here you need the opposite: the same input – the same verdict, whether the reply is correct or not. A high temperature would only add noise.
We had to turn the content filter off
The base model has a built-in content filter, on by default. For a difficult client we had to switch it off on every instance – it worked against the task itself. In roughly half the sessions the dialogue never reached the end: the filter tripped and cut off generation. Two reasons. First, the context is complex, and real call transcripts went into it – and those contain profanity from live conversations; the filter choked on its own input. Second, under the filter the client’s behavior itself degraded: it grew more cautious and blander, sliding back toward the agreeable assistant we’re trying to get away from. The funniest cases to catch were when the model generated text that then didn’t clear its own censorship.
Once the model’s built-in filters were off, sessions started reaching the review, and the client stopped softening up.
Safety, meanwhile, doesn’t rest on the model’s filter but on strict input validation: the manager’s reply is checked before the client even responds, and anything outside the bounds of the training – rudeness, going off-topic, a compliance breach – is blocked. It’s the same manager’s-reply validator discussed below.
No judge over the client – just a completion check
You might expect a mirror of the manager’s validator on the client’s side: a check for whether the model stayed in character or caved. There isn’t one, and it wouldn’t earn its place.
Caving is already cut off upstream, by the layers in the prompt itself: high temperature, the ban on cheap excuses, “don’t coach the manager,” the role reminder on every turn. A separate LLM check on top of that would catch a handful of borderline cases while adding a call on every turn – direct latency right where the manager is already waiting for a response.
What the client side does need is a completion check: the simulator has to notice the conversation is due to close and trigger the review. The triggers are simple – the client agreed (the word “agree” or “thanks” shows up in the line), the client hung up (the model can emit a “client hung up” marker), the dialogue looped, or the client repeats the same thing several turns in a row. It’s a separate chain at the same low temperature as the manager’s validator, plus a few keyword checks.
The right to “hang up” was given to the client deliberately. In a dead end, a real client does exactly that – and the system gets an unambiguous completion signal as a bonus. One move covers both realism and engineering.
So the client’s reply is checked too – it just answers a different question. Not “is this a good reply?” but “is the conversation over?”
Seven turns as a rule of the game
A real call is bounded in time. The training has to be too – otherwise the manager either drowns in an endless attempt to “close” the client or burns an hour on one session. The simulator hard-codes a strict limit – 7 turns; after the seventh the client ends the dialogue and the final review kicks in, no matter what. In code terms, it’s a counter and an exit condition.
What’s interesting isn’t the number but how it’s presented. The limit is announced right away in the client’s first line: “You have 7 turns to handle my objection.” That turns it from a technical constraint that will suddenly cut the conversation off into a rule of the game: the manager starts counting moves and choosing what to spend them on. In the first versions the limit existed but wasn’t announced – and users wrote that the simulator “cuts the conversation off oddly.” We added a line to the first reply, and the negativity on that front disappeared entirely. A technical limit of an LLM system is better explained than hidden: that way it stops burning the user’s trust.
What it comes down to
A robust roleplay character isn’t “a prompt asking the model to play a client” – it’s a stack of constraints, where each layer closes off one way of falling out of the role. A profile from real data – so the client isn’t averaged-out. Hard role rules – so the model doesn’t slide into a coach. A ban on cheap excuses – so it doesn’t hide behind “no time.” Background context it doesn’t recite – so it doesn’t break the scene. Its own temperature – so it stays alive. The content filter switched off – so the client can be sharp at all. Remove any layer, and the simulator stops training.
If you’re building something similar, budget not for “writing the role” but for “debugging a role that doesn’t fall apart under pressure from the user.”
The second article in the series on the AI mentor and simulator for client-facing managers. The first part covers the overall arc of the project, from daily reports to the simulator. The third part is about the architecture of the production agent.
A retrospective on our LLM mentor for sales managers – a 2023–2024 project. Picking the prompt through failures, designing a pilot without a 'Big Brother' effect, an A/B test that went the wrong way, and a three-step prompt chain instead of one. This is the first article in the series – an overview: the journey, the failures, and the five takeaways I've been carrying into every LLM product since.
To train managers you need a virtual client who resists and says no. An off-the-shelf LLM fails hardest exactly here – RLHF made it agreeable. I unpack how an uncooperative interlocutor is assembled out of a friendly assistant: role layers, different temperatures for the client and the validator, a ban on cheap excuses, the director's note 'you don't know this' – and why the client's reply gets no quality judge, only a check for whether the conversation is over.