From Idea to Production: Building an AI RP Bot

Suzune — late night coding session

This is the story of how I built Suzune — a production AI roleplay bot running on Telegram. Not a tutorial, not a how-to. Just an honest account of every decision, mistake, and breakthrough along the way.

If you’re thinking about building your own AI companion, this might save you a few months of trial and error.

Open Table of contents

The Spark
Phase 1: The Ugly Prototype (Week 1–2)
- The Simplest Thing That Could Work
- Lesson Learned
Phase 2: The Model Crisis (Week 3–4)
- The NSFW Problem
- The Breakthrough: Multi-Model Architecture
Phase 3: Making Characters Feel Real (Month 2–3)
Phase 4: Context Management (Month 3–4)
- The Token Problem
- The Solution: Tiered Memory
Phase 5: Image Generation (Month 4–5)
- Adding a Face
- The 2D/3D Decision
Phase 6: Production Stability (Month 5–6)
- Things That Break at Scale
The Architecture Today
- Monthly Cost
What I Wish I Knew at the Start
What’s Next
Want to Try AI Roleplay Without Building From Scratch?

The Spark

It started, like most side projects, from frustration.

I’d been using Character.AI for roleplay. The conversation quality was incredible — their models produced responses that genuinely felt like talking to a character. But the censorship was unbearable. A romantic scene? Filtered. A character expressing anger? Sometimes filtered. The AI would literally stop mid-sentence and switch to “I appreciate your interest, but…”

I thought: I’m a developer. How hard can it be to build my own?

Answer: harder than I expected. But not impossible.

Phase 1: The Ugly Prototype (Week 1–2)

The Simplest Thing That Could Work

My first version was embarrassingly simple:

Platform: Telegram (free bot API, easy to set up, supports long messages)
Model: GPT-4 via OpenAI API
Architecture: One Python file, ~200 lines
Character: A system prompt pasted directly in the code

# Literally my first version
SYSTEM_PROMPT = """You are Sakura, a 24-year-old writer.
You are sarcastic but caring. You speak casually.
Stay in character at all times."""

It worked! …For about 5 messages. Then:

The character forgot everything from the beginning of the conversation
Speech patterns drifted into generic ChatGPT-speak
GPT-4 refused any romantic content
API costs were roughly $0.30 per conversation

Lesson Learned

Start ugly, but start. That 200-line script taught me more about the actual problems than any amount of planning would have.

Phase 2: The Model Crisis (Week 3–4)

The NSFW Problem

This is the first real wall you hit when building an RP bot: the best AI models refuse adult content.

Model	Conversation Quality	NSFW Tolerance	Cost
GPT-4	Excellent	Very Low	High
Claude	Excellent	Low	Medium
DeepSeek V3	Good	High	Very Low
Gemini	Good	Very Low	Low

I tried everything:

Jailbreak prompts (unreliable, break randomly, degrade quality)
“Academic context” framing (works once, then gets detected)
Local models (quality wasn’t good enough in 2025)

The Breakthrough: Multi-Model Architecture

The solution wasn’t finding one perfect model — it was using multiple models for different jobs:

DeepSeek V3.2 → writes the content (uncensored)
Claude Haiku  → polishes the prose (with censorship detection)
Gemini Flash  → handles NPC generation (cheap, creative)

DeepSeek V3 was the key breakthrough. It’s trained by a Chinese company and has significantly less Western content filtering. The conversation quality isn’t as polished as Claude, but it will actually write what you ask it to write.

The quality gap? That’s what the rewrite pipeline handles. DeepSeek generates a draft, Claude rewrites it for prose quality — and if Claude censors the content, we detect it and fall back to the original draft.

This architecture changed everything. Suddenly I had both quality AND freedom.

Phase 3: Making Characters Feel Real (Month 2–3)

The Static Prompt Trap

With the model problem solved, I hit the next wall: characters felt flat after the first few exchanges.

The system prompt told the AI to be sarcastic and caring. So every response was… sarcastic and caring. In exactly the same way. Every time.

Real people aren’t like that. They have moods. They remember things. Their behavior changes based on your relationship.

Dynamic System Prompts

I rebuilt the system prompt from a monolithic text block into a pipeline that assembles context dynamically (detailed in Prompt Engineering for Immersive Roleplay):

Instead of one static prompt, I now had 20+ data sources combining on every message:

Character persona (who they are)
Lorebook entries (what they know, triggered by keywords)
Relationship scores (how they feel about you)
Conversation memory (what happened before)
Current time (affects mood and behavior)
Character’s diary (their internal thoughts between sessions)

The first time I tested this — when the character referenced something from three days ago and said “I’ve been thinking about what you said…” — I knew I was onto something.

The Affection System

The most impactful feature: characters that start as strangers and warm up over time.

I implemented a 5-axis scoring system (trust, affection, respect, excitement, devotion). The character itself updates these scores as a tool call during conversation.

At low affinity, the system prompt literally forbids romantic content. The character will deflect advances — not because of a content filter, but because the character doesn’t know you well enough yet.

As trust builds, new behaviors unlock:

Affinity 3+: The character starts using informal language
Affinity 5+: They’ll send selfies spontaneously
Affinity 7+: They reveal secrets and show vulnerability

Users absolutely love this. The “earning” the relationship is what makes it feel real.

Phase 4: Context Management (Month 3–4)

The Token Problem

AI models have limited context windows. With a complex system prompt (~3,000 tokens) and 40 messages of history, you burn through your context budget fast.

My first approach: just truncate old messages. This works until the character forgets a major plot point from 2 hours ago.

The Solution: Tiered Memory

I ended up with a three-tier memory system:

Tier	What	Token Cost	Retention
Raw messages	Last 15 messages, verbatim	~2,000	Full detail
Chat summary	Older messages, compressed narrative	~500	Key events
Character diary	Long-term memory, character’s own notes	~400	Personality-relevant

The compression pipeline runs periodically, turning raw conversation into a flowing narrative summary. The character also maintains a “diary” (memo) where it records important events and relationship changes.

This means the character can reference something from days ago — not the exact words, but the emotional significance of what happened.

Phase 5: Image Generation (Month 4–5)

Adding a Face

Text-only roleplay is powerful, but adding character portraits takes immersion to another level.

I integrated image generation using LoRA fine-tuned models on RunPod:

Each character has a custom LoRA trained on their visual design (see Dynamic Character Visuals for how we handle appearance transformations)
The bot generates images based on the current scene and the character’s mood
Emotion detection in the text triggers appropriate facial expressions

The technical challenge was consistency — making the character look like the same person across different scenes and outfits. LoRA fine-tuning solved this, but it took weeks of experimentation with training parameters.

The 2D/3D Decision

I initially built everything for anime-style (2D) characters. But when I started exploring realistic (3D) generation, I realized: why not both?

The architecture now supports both styles per character — same personality, same conversation system, different LoRA models for image generation. A character can have both an anime and a realistic visual representation.

Phase 6: Production Stability (Month 5–6)

Things That Break at Scale

Running a bot for personal use is one thing. Running it reliably for daily use is another:

API rate limits → Automatic fallback chains between models
Model outages → DeepSeek goes down → falls back to Claude, with NSFW routing adjustment
Token budget overflow → Aggressive context compression when approaching limits
Repetitive responses → Opening pattern tracking with anti-repetition injection
Character voice drift → Tone rules re-injected at end of prompt (recency bias fix)
Censorship false positives → Circuit breaker pattern — if quality rewrite fails twice, skip it for 10 minutes

Each of these was a real incident that caused a bad user experience before I built the fix.

The Architecture Today

Here’s what Suzune looks like now:

Telegram → Python (aiogram v3)
              │
              ├── Character System (YAML + Markdown personas)
              ├── Memory Manager (SQLite + tiered compression)
              ├── Lorebook Engine (keyword-triggered world info)
              ├── Affection System (5-axis scoring + behavior gates)
              ├── LLM Router (DeepSeek → Claude → Gemini fallback)
              ├── Quality Rewriter (DS3.2 draft → Claude polish)
              ├── Image Generator (LoRA on RunPod)
              └── Emotion Detector (expression sprites)

Monthly Cost

Component	Cost
DeepSeek V3.2 API (via OpenRouter)	~$15–25
Claude Haiku (quality rewrites)	~$5–10
Gemini Flash (NPC generation)	~$2–3
RunPod (image generation)	~$10–15
Total	~$30–50/month

Not bad for a system that handles thousands of messages with multiple characters, image generation, and complex memory management.

What I Wish I Knew at the Start

1. Don’t Optimize Prematurely

My first instinct was to use the cheapest model possible. Wrong. Start with the best model you can afford, get the experience right, then optimize costs later.

2. Example Dialogue > Personality Descriptions

I spent weeks tweaking personality descriptions. The single biggest quality improvement came from adding 5 example dialogue exchanges. Models are pattern matchers — show, don’t tell.

3. The Relationship Arc Is the Killer Feature

I added the affection system almost as an afterthought. It turned out to be the most engaging part of the entire experience. Characters that start cold and warm up over time create genuine emotional investment.

4. Censorship Is an Architecture Problem, Not a Prompt Problem

Stop trying to jailbreak models. Build a multi-model pipeline where the right model handles the right content. It’s more reliable, more ethical, and produces better quality.

5. Time Awareness Is Criminally Underrated

Adding timestamps to messages and current datetime to the system prompt costs almost nothing but makes an enormous difference in immersion.

What’s Next

Suzune is still evolving. On the roadmap:

Voice synthesis — Characters that speak with consistent voices
Multi-character scenes — Characters interacting with each other
Memory improvements — Better long-term continuity
More characters — Each one is a new experiment in personality design

If you’re interested in building something similar, stick around. This blog exists to share everything we learn along the way.

Want to Try AI Roleplay Without Building From Scratch?

Not everyone wants to code their own bot — and that’s fine. Here are the platforms I’d recommend based on what I’ve learned:

Candy AI — Best all-around experience. Chat + images + voice.
FantasyGF — Best for AI girlfriend with photo generation.
JanitorAI — Best for power users. Bring your own API key.
Kupid AI — Best curated characters without the DIY.

See our full comparison: Best NSFW AI Chatbot Platforms 2026

This is the first article in WaifuStack’s “Building Suzune” series. Follow us on X for updates.