Back to work
AI ContentAgent: VL-Coresystem

The 100-Episode AI Anime Project — A Script-to-Screen Production Line

A 10-step standardized pipeline covering script, storyboard, keyframes, image-to-video, voice and editing — aiming at 100 one-minute AI anime episodes. First pilot in production.

The 100-Episode AI Anime Project — A Script-to-Screen Production Line

The goal is blunt: 100 one-minute AI anime episodes. The hard part was never generating one stunning clip — it is generating a hundred consistent ones. Characters morph between shots, aesthetics drift at random, the tool landscape reshuffles monthly. My answer: write the standard before turning the cameras on — a 35,000-word, 13-section production standard (STANDARDS.md) that domesticates the gacha pull into an assembly line.

Three governing principles

  1. Concrete nouns only. Vague words like "cinematic, 4K, premium" are banned outright; everything must be rewritten as executable description: flat cel shading, hard-edged shadows, hand-drawn ink lines, gouache-painted urban backgrounds;
  2. Describing flaws is describing reality. Every asset carries at least two marks of hand craft — line grain, slight cel registration jitter, vintage film light leaks. In animation, flaws are not cheapness; they are craft;
  3. Gacha pulls are the norm — a budget, not a failure. The reference production burned roughly 400 images and 200+ video clips to yield 40 finished shots; 20+ pulls on a hard shot is a normal investment, 2–3 on an easy one. The effort goes into curation, not into perfecting the prompt.

Tools are compared into place, not followed into place

Every pipeline step gets a 3–4 tool comparison table with written reasoning before a primary is chosen: GPT-Image-2 for stills (natural language + multi-reference, strongest consistency), Seedance 2.0 for video (multi-asset reference + native audio-visual generation), ElevenLabs v3 for voice — though in 2026 Chinese TTS blind tests Fish Audio S2 leads 8.11 to 2.36, and the primary switches the moment verification lands. The standard binds each step's acceptance criteria, never any vendor.

ScriptStyle bibleCharacter / setsheetsStoryboard +animaticKeyframesImage-to-videoVoice / score /editGrade + subtitles
The 10-step line (main trunk)

The sequence is iron law: voice precedes lip-sync (Audio-First — recorded voice duration back-calibrates shot length), and the color palette locks at the style-bible stage; the final grade unifies, never patches. A self-check script (check-shotlist.py) automatically verifies nine hard rules per shot list — six-section prompts complete, craft anchors present, no vague words, assets fully bound. Whatever a machine can check is never left to discipline.

The pilot: "Don't Look Back"

A Satoshi Kon-style neo-noir psychological thriller — 60 seconds, 16 shots. Logline: a woman who has been running for ten years ducks into a phone booth to call for help, and the call connects to herself ten years ago — only when she turns around does she see that the monster her younger self feared is what she has become. Three visual motifs run through the film: a glass reflection slowly falling out of sync, a circular match cut between rotary dial and pupil, and a Möbius-loop ending that precisely mirrors the opening composition. The palette is crushed into slate grey and ink green; the film's only saturated warmth is the red scarf she wore when she was young.

Every hard shot (desynced reflections, two overlapping figures, multiplied reflections) ships with a Plan B: past 15 failed pulls, switch to layered generation and DaVinci compositing — never let the producer die in a gacha black hole.