This is the story of how I built Suzune — a production AI roleplay bot running on Telegram. Not a tutorial, not a how-to. Just an honest account of every decision, mistake, and breakthrough along the way.
If you’re thinking about building your own AI companion, this might save you a few months of trial and error.
Table of contents
Open Table of contents
- The Spark
- Phase 1: The Ugly Prototype (Week 1–2)
- Phase 2: The Model Crisis (Week 3–4)
- Phase 3: Making Characters Feel Real (Month 2–3)
- Phase 4: Context Management (Month 3–4)
- Phase 5: Image Generation (Month 4–5)
- Phase 6: Production Stability (Month 5–6)
- The Architecture Today
- What I Wish I Knew at the Start
- What’s Next
- Want to Try AI Roleplay Without Building From Scratch?
The Spark
It started, like most side projects, from frustration.
I’d been using Character.AI for roleplay. The conversation quality was incredible — their models produced responses that genuinely felt like talking to a character. But the censorship was unbearable. A romantic scene? Filtered. A character expressing anger? Sometimes filtered. The AI would literally stop mid-sentence and switch to “I appreciate your interest, but…”
I thought: I’m a developer. How hard can it be to build my own?
Answer: harder than I expected. But not impossible.
Phase 1: The Ugly Prototype (Week 1–2)
The Simplest Thing That Could Work
My first version was embarrassingly simple:
- Platform: Telegram (free bot API, easy to set up, supports long messages)
- Model: GPT-4 via OpenAI API
- Architecture: One Python file, ~200 lines
- Character: A system prompt pasted directly in the code
# Literally my first version
SYSTEM_PROMPT = """You are Sakura, a 24-year-old writer.
You are sarcastic but caring. You speak casually.
Stay in character at all times."""
It worked! …For about 5 messages. Then:
- The character forgot everything from the beginning of the conversation
- Speech patterns drifted into generic ChatGPT-speak
- GPT-4 refused any romantic content
- API costs were roughly $0.30 per conversation
Lesson Learned
Start ugly, but start. That 200-line script taught me more about the actual problems than any amount of planning would have.
Phase 2: The Model Crisis (Week 3–4)
The NSFW Problem
This is the first real wall you hit when building an RP bot: the best AI models refuse adult content.
| Model | Conversation Quality | NSFW Tolerance | Cost |
|---|---|---|---|
| GPT-4 | Excellent | Very Low | High |
| Claude | Excellent | Low | Medium |
| DeepSeek V3 | Good | High | Very Low |
| Gemini | Good | Very Low | Low |
I tried everything:
- Jailbreak prompts (unreliable, break randomly, degrade quality)
- “Academic context” framing (works once, then gets detected)
- Local models (quality wasn’t good enough in 2025)
The Breakthrough: Multi-Model Architecture
The solution wasn’t finding one perfect model — it was using multiple models for different jobs:
DeepSeek V3.2 → writes the content (uncensored)
Claude Haiku → polishes the prose (with censorship detection)
Gemini Flash → handles NPC generation (cheap, creative)
DeepSeek V3 was the key unlock. It’s trained by a Chinese company and has significantly less Western content filtering. The conversation quality isn’t as polished as Claude, but it will actually write what you ask it to write.
The quality gap? That’s what the rewrite pipeline handles. DeepSeek generates a draft, Claude rewrites it for prose quality — and if Claude censors the content, we detect it and fall back to the original draft.
This architecture changed everything. Suddenly I had both quality AND freedom.
Phase 3: Making Characters Feel Real (Month 2–3)
The Static Prompt Trap
With the model problem solved, I hit the next wall: characters felt flat after the first few exchanges.
The system prompt told the AI to be sarcastic and caring. So every response was… sarcastic and caring. In exactly the same way. Every time.
Real people aren’t like that. They have moods. They remember things. Their behavior changes based on your relationship.
Dynamic System Prompts
I rebuilt the system prompt from a monolithic text block into a pipeline that assembles context dynamically (detailed in Prompt Engineering for Immersive Roleplay):
Instead of one static prompt, I now had 20+ data sources combining on every message:
- Character persona (who they are)
- Lorebook entries (what they know, triggered by keywords)
- Relationship scores (how they feel about you)
- Conversation memory (what happened before)
- Current time (affects mood and behavior)
- Character’s diary (their internal thoughts between sessions)
The first time I tested this — when the character referenced something from three days ago and said “I’ve been thinking about what you said…” — I knew I was onto something.
The Affection System
The most impactful feature: characters that start as strangers and warm up over time.
I implemented a 5-axis scoring system (trust, affection, respect, excitement, devotion). The character itself updates these scores as a tool call during conversation.
At low affinity, the system prompt literally forbids romantic content. The character will deflect advances — not because of a content filter, but because the character doesn’t know you well enough yet.
As trust builds, new behaviors unlock:
- Affinity 3+: The character starts using informal language
- Affinity 5+: They’ll send selfies spontaneously
- Affinity 7+: They reveal secrets and show vulnerability
Users absolutely love this. The “earning” the relationship is what makes it feel real.
Phase 4: Context Management (Month 3–4)
The Token Problem
AI models have limited context windows. With a complex system prompt (~3,000 tokens) and 40 messages of history, you burn through your context budget fast.
My first approach: just truncate old messages. This works until the character forgets a major plot point from 2 hours ago.
The Solution: Tiered Memory
I ended up with a three-tier memory system:
| Tier | What | Token Cost | Retention |
|---|---|---|---|
| Raw messages | Last 15 messages, verbatim | ~2,000 | Full detail |
| Chat summary | Older messages, compressed narrative | ~500 | Key events |
| Character diary | Long-term memory, character’s own notes | ~400 | Personality-relevant |
The compression pipeline runs periodically, turning raw conversation into a flowing narrative summary. The character also maintains a “diary” (memo) where it records important events and relationship changes.
This means the character can reference something from days ago — not the exact words, but the emotional significance of what happened.
Phase 5: Image Generation (Month 4–5)
Adding a Face
Text-only roleplay is powerful, but adding character portraits takes immersion to another level.
I integrated image generation using LoRA fine-tuned models on RunPod:
- Each character has a custom LoRA trained on their visual design (see Dynamic Character Visuals for how we handle appearance transformations)
- The bot generates images based on the current scene and the character’s mood
- Emotion detection in the text triggers appropriate facial expressions
The technical challenge was consistency — making the character look like the same person across different scenes and outfits. LoRA fine-tuning solved this, but it took weeks of experimentation with training parameters.
The 2D/3D Decision
I initially built everything for anime-style (2D) characters. But when I started exploring realistic (3D) generation, I realized: why not both?
The architecture now supports both styles per character — same personality, same conversation system, different LoRA models for image generation. A character can have both an anime and a realistic visual representation.
Phase 6: Production Stability (Month 5–6)
Things That Break at Scale
Running a bot for personal use is one thing. Running it reliably for daily use is another:
- API rate limits → Automatic fallback chains between models
- Model outages → DeepSeek goes down → falls back to Claude, with NSFW routing adjustment
- Token budget overflow → Aggressive context compression when approaching limits
- Repetitive responses → Opening pattern tracking with anti-repetition injection
- Character voice drift → Tone rules re-injected at end of prompt (recency bias fix)
- Censorship false positives → Circuit breaker pattern — if quality rewrite fails twice, skip it for 10 minutes
Each of these was a real incident that caused a bad user experience before I built the fix.
The Architecture Today
Here’s what Suzune looks like now:
Telegram → Python (aiogram v3)
│
├── Character System (YAML + Markdown personas)
├── Memory Manager (SQLite + tiered compression)
├── Lorebook Engine (keyword-triggered world info)
├── Affection System (5-axis scoring + behavior gates)
├── LLM Router (DeepSeek → Claude → Gemini fallback)
├── Quality Rewriter (DS3.2 draft → Claude polish)
├── Image Generator (LoRA on RunPod)
└── Emotion Detector (expression sprites)
Monthly Cost
| Component | Cost |
|---|---|
| DeepSeek V3.2 API (via OpenRouter) | ~$15–25 |
| Claude Haiku (quality rewrites) | ~$5–10 |
| Gemini Flash (NPC generation) | ~$2–3 |
| RunPod (image generation) | ~$10–15 |
| Total | ~$30–50/month |
Not bad for a system that handles thousands of messages with multiple characters, image generation, and complex memory management.
What I Wish I Knew at the Start
1. Don’t Optimize Prematurely
My first instinct was to use the cheapest model possible. Wrong. Start with the best model you can afford, get the experience right, then optimize costs later.
2. Example Dialogue > Personality Descriptions
I spent weeks tweaking personality descriptions. The single biggest quality improvement came from adding 5 example dialogue exchanges. Models are pattern matchers — show, don’t tell.
3. The Relationship Arc Is the Killer Feature
I added the affection system almost as an afterthought. It turned out to be the most engaging part of the entire experience. Characters that start cold and warm up over time create genuine emotional investment.
4. Censorship Is an Architecture Problem, Not a Prompt Problem
Stop trying to jailbreak models. Build a multi-model pipeline where the right model handles the right content. It’s more reliable, more ethical, and produces better quality.
5. Time Awareness Is Criminally Underrated
Adding timestamps to messages and current datetime to the system prompt costs almost nothing but makes an enormous difference in immersion.
What’s Next
Suzune is still evolving. On the roadmap:
- Voice synthesis — Characters that speak with consistent voices
- Multi-character scenes — Characters interacting with each other
- Memory improvements — Better long-term continuity
- More characters — Each one is a new experiment in personality design
If you’re interested in building something similar, stick around. This blog exists to share everything we learn along the way.
Want to Try AI Roleplay Without Building From Scratch?
Not everyone wants to code their own bot — and that’s fine. Here are the platforms I’d recommend based on what I’ve learned:
- Candy AI — Best all-around experience. Chat + images + voice.
- FantasyGF — Best for AI girlfriend with photo generation.
- JanitorAI — Best for power users. Bring your own API key.
- Kupid AI — Best curated characters without the DIY.
See our full comparison: Best NSFW AI Chatbot Platforms 2026
This is the first article in WaifuStack’s “Building Suzune” series. Follow us on X for updates.