What is a GM Directive in an AI roleplay bot?

A GM Directive auto-generates daily behavior instructions injected into a character's system prompt. The problem: LLMs follow them too literally, overriding the user's actual conversation context.

How do you prevent system prompt bloat in AI chatbots?

Split persona.md (personality) from rules.md (constraints), replace long explanations with concrete examples, and target 800–1200 tokens total. Compliance improves when you cut, not add.

3 Features I Built for My AI Bot — Then Deleted Every One of Them

Q: Should AI chatbot characters send proactive messages to users?

Rarely. Users find unsolicited messages intrusive unless they explicitly opt in. Responsive quality beats proactive volume in every retention metric.

I have a bad habit of building features I’m excited about instead of features users need.

Three times with Suzune, I built something I was genuinely proud of — spent weeks designing it, writing the code, testing the edge cases — and then watched it make the bot noticeably worse. Each time, the fix was deletion.

Here’s the post-mortem on all three.

Open Table of contents

Feature 1: GM Directives (Automated Daily Behavior Instructions)
Feature 2: Proactive Messaging (Characters That Message First)
Feature 3: The Ever-Expanding System Prompt
- What I built
- What actually happened
The Pattern All Three Share
What This Means for Your Bot
FAQ

Feature 1: GM Directives (Automated Daily Behavior Instructions)

What I built

Every morning at 6 AM JST, an LLM analyzed each character’s recent conversation history and generated a “director’s note” — behavioral guidance for the day:

“Today, increase vulnerability signals. She’s been too guarded. Let some softness through in the afternoon exchange.”

These directives were saved to data/gm/directives/ and injected into the system prompt at the start of every conversation:

# sessions/directive.py
directive_text = get_directives_for_prompt(character.name)
if directive_text:
    parts.append(directive_text)  # appended to system prompt

The idea: characters would feel like they were developing naturally over time, with subtle mood variations and relationship progression, without me manually scripting anything.

What actually happened

The LLM followed the directives too well.

If the directive said “increase vulnerability signals,” the character would do it regardless of context. User opens with a work complaint? She pivots to being emotionally open. User wants a playful exchange? She’s doing soft introspective moments. The directive had effectively overridden the user’s actual conversational intent.

The problem is architectural: LLMs treat everything in the system prompt as equally important. There’s no “take this as a soft suggestion” mode. I thought I was adding a layer of nuance — I was actually adding a layer that competed with and frequently defeated the user’s input.

Timeline of failure:

Phase 1: Generate directives at 6 AM and 6 PM
Phase 3: Drop 6 PM (too frequent, characters felt erratic)
Phase 7: Reduce directive weight in prompt construction
Phase 9: Move to weekly instead of daily
Phase 11: Full deletion

DIRECTIVE_HOURS = ()  # shutdown

The lesson

LLM context has no “background” layer. Everything you inject competes for attention. What you frame as a “soft guide” becomes a hard directive.

Character development should emerge from the conversation itself, not from external automation injecting intentions the user didn’t create.

GM Directive injection code and shutdown in core/memory.py

Feature 2: Proactive Messaging (Characters That Message First)

What I built

The fantasy: Suzune’s characters would have lives. They’d message you in the morning, check in during lunch, send a late-night thought. They’d feel present even when you weren’t actively chatting.

The implementation: a scheduler that triggered character-initiated messages based on elapsed time, configurable per-character:

# proactive/scheduler.py
config = {
    "min_hours": 4,
    "max_hours": 18,
    "stagger_minutes": 45,  # avoid simultaneous multi-char messages
    "allowed_hours_jst": [9, 22],
}

Eight development phases. Collision detection between characters. Priority queues. Silence windows when the user was active elsewhere. I built a small scheduling engine.

What actually happened

Users found unsolicited messages jarring. Not universally — some loved the “she messaged me” moments — but the timing was almost always wrong. The message would arrive mid-meeting, or at 11 PM when someone was trying to wind down, or as a follow-up to a conversation that had ended on a deliberate note.

The deeper problem: the messages felt like interruptions, not presences. A character checking in with “hey, thinking about you” is sweet when you’re in the mood. It’s noise when you’re not. And there’s no way to know which state the user is in.

We shipped it, watched the feedback, watched the interaction logs, and reached the same conclusion every time we analyzed the data: users want responsive quality, not proactive volume.

Phase 14: full removal. Eight phases of scheduler code, gone.

The lesson

Presence ≠ message frequency. A character who responds brilliantly to everything you say feels more alive than one who messages you unprompted twice a day.

Before building proactive features, ask whether users explicitly asked for them. In our case: they hadn’t.

System prompt split into persona.md and rules.md

Feature 3: The Ever-Expanding System Prompt

What I built

This one is embarrassing because it wasn’t a discrete feature — it was a habit.

Every time a character behaved unexpectedly, I added a rule. Character used the same metaphor twice? Add: “Never repeat metaphors within a 10-message window.” Character broke character during an emotional scene? Add: “Maintain persona integrity even when the user expresses strong emotion.”

Suzune’s initial system prompt: ~200 lines. By Phase 8: 600+ lines.

What actually happened

The LLM started ignoring rules.

Not all of them — but the rules near the end of the prompt, or any rule that conflicted with a more prominent instruction, would get quietly dropped. “No repetition” would be in the prompt and the character would still repeat. “Stay in character” would be there and the character would still break.

I ran a controlled test: same conversation, prompt at 200 lines vs 600 lines. The 200-line version had measurably better rule compliance. The 600-line version had characters that felt like they were trying to satisfy too many constraints at once — which produced wooden, hedged responses.

The fix was aggressive compression: 29% cut in a single pass.

Before	After
620 lines	440 lines
~2,400 tokens	~1,700 tokens
~65% compliance on trailing rules	~88% compliance

The main techniques:

Replace explanations with examples. "Don't be excessively formal" → "❌ 'I would be honored to assist' ✅ 'sure, let me help'" — three example pairs outperform one paragraph of instruction.
Delete redundant rules. Audit for rules that say the same thing twice. They always exist.
Split persona from constraints. Character personality → persona.md. Hard rules → rules.md. Inject selectively based on context. (I wrote up the full architecture in How I Split a 600-Line System Prompt Into 3 Files.)

The sweet spot we’ve landed on: 800–1,200 tokens for the base system prompt. Above that, compliance degrades.

Character directory structure with persona.md and rules.md

Each failure had the same shape:

Feature	The Hope	The Reality
GM Directives	Subtle character evolution	LLM overrides user intent
Proactive Messaging	Characters feel alive	Interruptions users didn’t ask for
Prompt expansion	More control	Less compliance

The common thread: I was trying to add control to a system that works best when you trust it. LLMs are good at picking up on conversational context and user tone. Every time I tried to override that with external automation or additional rules, I was fighting the model’s strengths.

The reframe that helped: AI character quality is determined more by what you remove than what you add.

What This Means for Your Bot

If you’re building an AI roleplay bot, here’s the audit I’d run:

System prompt over 1,000 tokens? Start cutting. Pick the 10 most important rules, delete the rest, and test compliance before adding anything back.
Any feature that runs on a timer? Ask whether users explicitly requested it. If not, consider whether it’s solving a real problem or a hypothetical one.
Any feature you can describe as “the LLM will subtly…”? Anything “subtle” in a system prompt is probably not subtle. Test what the model actually does with it before shipping.

The features I deleted are the best architecture decisions I made. The bot that exists now is simpler than what I planned — and substantially better for it.

FAQ

Why do more rules in a system prompt make AI characters worse?

LLMs treat all instructions as equally important. As token count grows, attention dilutes and later rules get silently dropped. 800–1,200 tokens is the practical compliance ceiling for base system prompts — past that, you’re adding rules that won’t consistently stick.

Should AI chatbot characters send proactive messages?

Rarely, and only if users opt in explicitly. Users who didn’t ask for proactive messages experience them as interruptions, not engagement. The retention data consistently favors responsive quality over outbound frequency.

What’s the difference between persona.md and rules.md?

persona.md holds the character’s personality, speech patterns, background, and emotional range — the “who she is.” rules.md holds hard behavioral constraints — what she won’t do, format rules, escalation handling. Splitting them lets you inject each selectively and audit them independently.

How do you prevent GM Directive-style overcorrection in AI characters?

Don’t inject behavioral intentions into the system prompt externally. Let character development emerge from the conversation history. If you want characters to evolve, invest in memory retrieval and context summarization — not automated directive injection.

3 Features I Built for My AI Bot — Then Deleted Every One of Them

Table of contents

Feature 1: GM Directives (Automated Daily Behavior Instructions)

What I built

What actually happened

The lesson

Feature 2: Proactive Messaging (Characters That Message First)

What I built

What actually happened

The lesson

Feature 3: The Ever-Expanding System Prompt

What I built

What actually happened

The Pattern All Three Share

What This Means for Your Bot

FAQ

Why do more rules in a system prompt make AI characters worse?

Should AI chatbot characters send proactive messages?

What’s the difference between persona.md and rules.md?

How do you prevent GM Directive-style overcorrection in AI characters?