Live 2026

ComfyUI MCP Server

Claude's Creative Layer

Images, portraits, speech, lip-sync—Claude's creative layer for whatever you can describe

Yes, this image was generated by the tool. Built for internal asset generation—use responsibly.

467

Tests passing

Vitest, strict TypeScript

35+

MCP tools

txt2img to talking heads

6

Model strategies

Illustrious, Pony, Flux, SDXL, SD1.5, Realistic

I wanted Claude to generate images through my local ComfyUI setup. Simple enough. Ran Sonic on my Mac. Ten seconds of pure black frames. The GPU couldn't keep up.

So I distributed it.

The MCP server runs on Fly.io—stateless, auto-scaling. GPU compute lives on RunPod, pay-per-second. Generated assets go to Supabase with signed URLs. Tailscale meshes it all together securely. What started as "let me generate some images" became a production distributed system because the alternative was a space heater that outputs nothing.

Now Claude can generate images, upscale them, run ControlNet pipelines, synthesize speech, and create lip-synced talking head videos—all through natural conversation. No API fees. Full parameter control. The kind of setup that makes you dangerous.

Where this goes: Characters that speak. Tutors with faces. Historical figures who answer questions in their own voice. The infrastructure is here. The applications are next.

Architecture

MCP server exposes tools to Claude via the Model Context Protocol. Each tool builds a ComfyUI workflow graph dynamically—checkpoint loaders, CLIP encoders, samplers, VAE decoders—then submits it to the remote GPU. WebSocket monitoring tracks progress in real-time. Storage abstraction supports Supabase, GCP, or local filesystem with zero code changes.

View diagram
┌─── Claude (MCP Client) ───┐
│   Natural language req    │
└──────────┬────────────────┘
           ↓
┌─── Fly.io (MCP Server) ───┐
│  35+ tools, rate limiting │
│  Upstash Redis, Quirrel   │
└──────────┬────────────────┘
           ↓ (Tailscale mesh)
┌─── RunPod (GPU) ──────────┐
│  ComfyUI on RTX 4090      │
│  Pay-per-second compute   │
└──────────┬────────────────┘
           ↓
┌─── Supabase (Storage) ────┐
│  Signed URLs, 1hr expiry  │
└───────────────────────────┘

Key Decisions

Distributed by Necessity

Local Mac rendered black frames with Sonic. The architecture emerged from hardware constraints, not overengineering. Now it scales.

Strategy Pattern for Model Prompting

Illustrious wants tags, Flux wants natural language, Pony needs score tags. Six model families, six strategies. Auto-detected from checkpoint name.

Cloud Storage Abstraction

Single interface, three implementations. Swap providers with an env var. No vendor lock-in.

Quirrel for Long-Running Jobs

Fly.io has connection limits. Portrait generation, TTS, and lipsync run async through job queues. Prevents timeout deaths.

What Was Hard

Tailscale mesh setup between Fly.io and RunPod was underdocumented. Rate limiting in a distributed context required Upstash Redis—in-memory limiters fail when you have multiple instances. The ComfyUI WebSocket protocol has quirks that took time to stabilize.

TypeScript MCP SDK Hono Zod Vitest Fly.io RunPod Supabase Tailscale Upstash Redis Quirrel