ComfyUI MCP Server
Claude's Creative Layer
Images, portraits, speech, lip-sync—Claude's creative layer for whatever you can describe
Screenshots
Results
467
Tests passing
Vitest, strict TypeScript
35+
MCP tools
txt2img to talking heads
6
Model strategies
Illustrious, Pony, Flux, SDXL, SD1.5, Realistic
I wanted Claude to generate images through my local ComfyUI setup. Simple enough. Ran Sonic on my Mac. Ten seconds of pure black frames. The GPU couldn't keep up.
So I distributed it.
The MCP server runs on Fly.io—stateless, auto-scaling. GPU compute lives on RunPod, pay-per-second. Generated assets go to Supabase with signed URLs. Tailscale meshes it all together securely. What started as "let me generate some images" became a production distributed system because the alternative was a space heater that outputs nothing.
Now Claude can generate images, upscale them, run ControlNet pipelines, synthesize speech, and create lip-synced talking head videos—all through natural conversation. No API fees. Full parameter control. The kind of setup that makes you dangerous.
Where this goes: Characters that speak. Tutors with faces. Historical figures who answer questions in their own voice. The infrastructure is here. The applications are next.
For Engineers
Architecture
MCP server exposes tools to Claude via the Model Context Protocol. Each tool builds a ComfyUI workflow graph dynamically—checkpoint loaders, CLIP encoders, samplers, VAE decoders—then submits it to the remote GPU. WebSocket monitoring tracks progress in real-time. Storage abstraction supports Supabase, GCP, or local filesystem with zero code changes.
View diagram
┌─── Claude (MCP Client) ───┐
│ Natural language req │
└──────────┬────────────────┘
↓
┌─── Fly.io (MCP Server) ───┐
│ 35+ tools, rate limiting │
│ Upstash Redis, Quirrel │
└──────────┬────────────────┘
↓ (Tailscale mesh)
┌─── RunPod (GPU) ──────────┐
│ ComfyUI on RTX 4090 │
│ Pay-per-second compute │
└──────────┬────────────────┘
↓
┌─── Supabase (Storage) ────┐
│ Signed URLs, 1hr expiry │
└───────────────────────────┘
Key Decisions
Distributed by Necessity
Local Mac rendered black frames with Sonic. The architecture emerged from hardware constraints, not overengineering. Now it scales.
Strategy Pattern for Model Prompting
Illustrious wants tags, Flux wants natural language, Pony needs score tags. Six model families, six strategies. Auto-detected from checkpoint name.
Cloud Storage Abstraction
Single interface, three implementations. Swap providers with an env var. No vendor lock-in.
Quirrel for Long-Running Jobs
Fly.io has connection limits. Portrait generation, TTS, and lipsync run async through job queues. Prevents timeout deaths.
What Was Hard
Tailscale mesh setup between Fly.io and RunPod was underdocumented. Rate limiting in a distributed context required Upstash Redis—in-memory limiters fail when you have multiple instances. The ComfyUI WebSocket protocol has quirks that took time to stabilize.
Stack