Skip to content
WaifuStack
Go back

From Idea to Production: How I Built an AI Roleplay Bot That Actually Works

This is the story of how I built Suzune — a production AI roleplay bot running on Telegram. Not a tutorial, not a how-to. Just an honest account of every decision, mistake, and breakthrough along the way.

If you’re thinking about building your own AI companion, this might save you a few months of trial and error.

Table of contents

Open Table of contents

The Spark

It started, like most side projects, from frustration.

I’d been using Character.AI for roleplay. The conversation quality was incredible — their models produced responses that genuinely felt like talking to a character. But the censorship was unbearable. A romantic scene? Filtered. A character expressing anger? Sometimes filtered. The AI would literally stop mid-sentence and switch to “I appreciate your interest, but…”

I thought: I’m a developer. How hard can it be to build my own?

Answer: harder than I expected. But not impossible.


Phase 1: The Ugly Prototype (Week 1–2)

The Simplest Thing That Could Work

My first version was embarrassingly simple:

# Literally my first version
SYSTEM_PROMPT = """You are Sakura, a 24-year-old writer.
You are sarcastic but caring. You speak casually.
Stay in character at all times."""

It worked! …For about 5 messages. Then:

Lesson Learned

Start ugly, but start. That 200-line script taught me more about the actual problems than any amount of planning would have.


Phase 2: The Model Crisis (Week 3–4)

The NSFW Problem

This is the first real wall you hit when building an RP bot: the best AI models refuse adult content.

ModelConversation QualityNSFW ToleranceCost
GPT-4ExcellentVery LowHigh
ClaudeExcellentLowMedium
DeepSeek V3GoodHighVery Low
GeminiGoodVery LowLow

I tried everything:

The Breakthrough: Multi-Model Architecture

The solution wasn’t finding one perfect model — it was using multiple models for different jobs:

DeepSeek V3.2 → writes the content (uncensored)
Claude Haiku  → polishes the prose (with censorship detection)
Gemini Flash  → handles NPC generation (cheap, creative)

DeepSeek V3 was the key unlock. It’s trained by a Chinese company and has significantly less Western content filtering. The conversation quality isn’t as polished as Claude, but it will actually write what you ask it to write.

The quality gap? That’s what the rewrite pipeline handles. DeepSeek generates a draft, Claude rewrites it for prose quality — and if Claude censors the content, we detect it and fall back to the original draft.

This architecture changed everything. Suddenly I had both quality AND freedom.


Phase 3: Making Characters Feel Real (Month 2–3)

The Static Prompt Trap

With the model problem solved, I hit the next wall: characters felt flat after the first few exchanges.

The system prompt told the AI to be sarcastic and caring. So every response was… sarcastic and caring. In exactly the same way. Every time.

Real people aren’t like that. They have moods. They remember things. Their behavior changes based on your relationship.

Dynamic System Prompts

I rebuilt the system prompt from a monolithic text block into a pipeline that assembles context dynamically (detailed in Prompt Engineering for Immersive Roleplay):

Instead of one static prompt, I now had 20+ data sources combining on every message:

The first time I tested this — when the character referenced something from three days ago and said “I’ve been thinking about what you said…” — I knew I was onto something.

The Affection System

The most impactful feature: characters that start as strangers and warm up over time.

I implemented a 5-axis scoring system (trust, affection, respect, excitement, devotion). The character itself updates these scores as a tool call during conversation.

At low affinity, the system prompt literally forbids romantic content. The character will deflect advances — not because of a content filter, but because the character doesn’t know you well enough yet.

As trust builds, new behaviors unlock:

Users absolutely love this. The “earning” the relationship is what makes it feel real.


Phase 4: Context Management (Month 3–4)

The Token Problem

AI models have limited context windows. With a complex system prompt (~3,000 tokens) and 40 messages of history, you burn through your context budget fast.

My first approach: just truncate old messages. This works until the character forgets a major plot point from 2 hours ago.

The Solution: Tiered Memory

I ended up with a three-tier memory system:

TierWhatToken CostRetention
Raw messagesLast 15 messages, verbatim~2,000Full detail
Chat summaryOlder messages, compressed narrative~500Key events
Character diaryLong-term memory, character’s own notes~400Personality-relevant

The compression pipeline runs periodically, turning raw conversation into a flowing narrative summary. The character also maintains a “diary” (memo) where it records important events and relationship changes.

This means the character can reference something from days ago — not the exact words, but the emotional significance of what happened.


Phase 5: Image Generation (Month 4–5)

Adding a Face

Text-only roleplay is powerful, but adding character portraits takes immersion to another level.

I integrated image generation using LoRA fine-tuned models on RunPod:

The technical challenge was consistency — making the character look like the same person across different scenes and outfits. LoRA fine-tuning solved this, but it took weeks of experimentation with training parameters.

The 2D/3D Decision

I initially built everything for anime-style (2D) characters. But when I started exploring realistic (3D) generation, I realized: why not both?

The architecture now supports both styles per character — same personality, same conversation system, different LoRA models for image generation. A character can have both an anime and a realistic visual representation.


Phase 6: Production Stability (Month 5–6)

Things That Break at Scale

Running a bot for personal use is one thing. Running it reliably for daily use is another:

Each of these was a real incident that caused a bad user experience before I built the fix.


The Architecture Today

Here’s what Suzune looks like now:

Telegram → Python (aiogram v3)

              ├── Character System (YAML + Markdown personas)
              ├── Memory Manager (SQLite + tiered compression)
              ├── Lorebook Engine (keyword-triggered world info)
              ├── Affection System (5-axis scoring + behavior gates)
              ├── LLM Router (DeepSeek → Claude → Gemini fallback)
              ├── Quality Rewriter (DS3.2 draft → Claude polish)
              ├── Image Generator (LoRA on RunPod)
              └── Emotion Detector (expression sprites)

Monthly Cost

ComponentCost
DeepSeek V3.2 API (via OpenRouter)~$15–25
Claude Haiku (quality rewrites)~$5–10
Gemini Flash (NPC generation)~$2–3
RunPod (image generation)~$10–15
Total~$30–50/month

Not bad for a system that handles thousands of messages with multiple characters, image generation, and complex memory management.


What I Wish I Knew at the Start

1. Don’t Optimize Prematurely

My first instinct was to use the cheapest model possible. Wrong. Start with the best model you can afford, get the experience right, then optimize costs later.

2. Example Dialogue > Personality Descriptions

I spent weeks tweaking personality descriptions. The single biggest quality improvement came from adding 5 example dialogue exchanges. Models are pattern matchers — show, don’t tell.

3. The Relationship Arc Is the Killer Feature

I added the affection system almost as an afterthought. It turned out to be the most engaging part of the entire experience. Characters that start cold and warm up over time create genuine emotional investment.

4. Censorship Is an Architecture Problem, Not a Prompt Problem

Stop trying to jailbreak models. Build a multi-model pipeline where the right model handles the right content. It’s more reliable, more ethical, and produces better quality.

5. Time Awareness Is Criminally Underrated

Adding timestamps to messages and current datetime to the system prompt costs almost nothing but makes an enormous difference in immersion.


What’s Next

Suzune is still evolving. On the roadmap:

If you’re interested in building something similar, stick around. This blog exists to share everything we learn along the way.


Want to Try AI Roleplay Without Building From Scratch?

Not everyone wants to code their own bot — and that’s fine. Here are the platforms I’d recommend based on what I’ve learned:

See our full comparison: Best NSFW AI Chatbot Platforms 2026


This is the first article in WaifuStack’s “Building Suzune” series. Follow us on X for updates.


Share this post on:

Previous Post
Best NSFW AI Chatbot Platforms in 2026: Honest Picks from a Bot Developer
Next Post
Prompt Engineering for Immersive AI Roleplay: Lessons from Building Suzune