The Production Pipeline

The Agent Constellation

Two pipelines, many agents. How the constellation turns research into published comics and autonomous concept art.

The Story
The Vision

One series. Ten issues.
Ninety years of computing.

From Turing to LLMs and Beyond is a 10-issue comic series that tells the story of computing from Alan Turing's 1936 thought experiment to modern multi-agent AI systems — produced by a human-AI team using a constellation of specialized agents.

The production pipeline mirrors what we believe content creation pipelines like films and games do. It's inspired by all the times wandering into production pipeline and art sessions at SIGGRAPH or GDC during years spent as a computer graphics engineer. A Researcher gathers historical facts, a Writer crafts narratives, an Editor and Red Team challenge the script, a Layout Designer composes panels, and an Image Generator renders the art. Every issue passes through multiple quality gates before publication.

10
Issues
438
Generated Images
8
Agent Types
199
Agent Outputs
75
Tracked Issues
6
Regen Rounds
The Comic Production Pipeline

Eight agents, one comic.

Each issue passes through a multi-agent pipeline — content is locked before any expensive image generation begins.

🔬
Research
✍️
Write
📋
Editor
⚔️
Red Team
📐
Layout
🎨
Image Gen
🔧
Assembly
👁️
Review
Concept Art
The Autonomous Pipeline

From breakthroughs to triptychs.

A separate pipeline generates concept art autonomously — crawling real science sources, finding narrative connections, and producing three-beat visual sequences grounded in actual breakthroughs.

The comic production pipeline above requires a human orchestrator at every stage. The concept art pipeline is different: it runs autonomously once launched, making creative decisions through a chain of specialized roles — research, education, direction, critique, generation, and evaluation.

🌐
Deep Research
🔗
Enhancement
🧬
Educator
🎬
Director
CREA Critic
🎨
Generate
🔄
Refine
👁️
VLM Critique
📊
Curate
Stage 1: Knowledge Gathering

Crawling real science, not imagining it.

Deep Research Sweep crawls 40+ real web sources — DeepMind, NASA, Nature, arXiv, among others — extracting breakthroughs with visual descriptions, dates, and source URLs. The current database contains 61 verified breakthroughs.

Deep Research Enhancement follows links from hub pages, fetches scientific reference images, and analyzes them with Claude vision to build rich visual descriptions. 98 reference images have been analyzed this way — the pipeline sees real imagery before generating anything.

Science Educator reads the full breakthrough database and finds connections: how one discovery enabled another, where themes recur across fields, what stories span decades. It produces narrative threads — each a 3-beat arc (setup, breakthrough, consequence) connecting 2-4 breakthroughs. 27 threads identified so far, from quantum computing's journey to AI-driven protein folding.

Stage 2: Autonomous Generation

Director, critic, generator, reviewer.

The Director designs three-beat visual sequences (before / moment / after) grounded in real breakthroughs and narrative threads. Each beat gets a detailed image prompt, emotional tone, and compositional direction.

The CREA Critic challenges every concept before any image is generated — rejecting weak ideas, clichéd compositions, or scenes that don't connect to real science. This is creative abrasion by design: bad concepts are caught before they waste generation time.

Approved concepts go to the Sparky APIFLUX.2-dev running on a DGX Spark. Every image gets an Always-Refine pass via image-to-image editing, using the initial output as a reference to improve detail and coherence. Visual continuity across beats is maintained by passing each beat's refined image as the reference for the next.

Stage 3: Multi-Signal Evaluation

Three critics, zero trust.

Every generated image faces three independent evaluations:

VLM Critique — Claude evaluates actual images (not just prompts), scoring composition, emotional impact, style coherence, narrative clarity, and grounding in real science. This catches problems that prompt-based audits miss entirely.

PickScore measures prompt-image alignment — how well the generated image matches what was asked for. HPSv2 measures aesthetic quality independent of the prompt. Together they provide automated scoring that complements the VLM's semantic evaluation.

A Sequence Continuity Check sends all three beats of a triptych to Claude together, verifying that the visual narrative reads coherently as a sequence — not just as individual images.

Results feed back into a creative memory: a winners board, mode switching (explore / exploit / pivot / thread), and a continuously updated HTML report with triptych sequences, auto-scores, and continuity badges.

Quality
The Regen Rounds

75 issues tracked.
Six rounds of fixes.

AI image generation is probabilistic. Laptops appear in 1936 Cambridge. German text gets garbled. CRT monitors show up in the 1940s. Each problem was tracked, categorized, and systematically fixed.

By category:

Image Regen (34) SVG Fix (16) Layout Tweak (10) Prompt Fix (8) Manifest Edit (4) System Rule (3)

By resolution:

Fixed (43) Open (31) Won't Fix (1)
Round 1 — Initial Generation
438 images across 10 issues
First complete pass with Z-Image-Turbo on local MPS. 52 problems identified in the initial post-generation review.
Round 2 — Critical Fixes
4 critical + 8 warning issues
Laptops in 1930s scenes. Bell Labs sign text garbled. Terminal count mismatches (6 vs 5).
Round 3 — Prompt Refinement
7 panels re-prompted
Seed-then-prompt escalation: try 2-3 seeds first. If all fail, the problem is the prompt.
Round 4 — Text Errors
9 panels with garbled text
AI-generated text in non-English languages is notoriously unreliable. German umlauts, substitution errors.
Round 5 — Charts & Diagrams
3 visualization fixes
Complex diagrams moved to inline SVG. AI generation works for atmosphere, not technical accuracy.
Round 6 — Final Polish
5 panels for final pass
Dennis Ritchie portrait refinement. Last anachronism sweeps. V1+V2 shipped to hexley.dev.

"Not 'we haven't solved it.' We PROVED it CAN'T be solved."

— Scribble, Issue 1, Page 7