MCP server · one of several tool layers feeding into vindler · View lab →
ComfyUI MCP Server
Claude's Creative Layer
35+ MCP tools bridging Claude to ComfyUI across image generation, speech synthesis, and lip-synced video
Screenshots
Results
467
Tests passing
Vitest, strict TypeScript
35+
MCP tools
txt2img to talking heads
6
Model strategies
Illustrious, Pony, Flux, SDXL, SD1.5, Realistic
I wanted Claude to generate images through my local ComfyUI setup, which I could use for talking heads in my language-learning platform. Simple enough: run Sonic on my Mac. Wait thirty minutes. Watch... ten seconds of pure black frames. Woops.
Unified Memory noped right the f*ck out.
Fair enough: time to distribute.
The MCP server moved to Fly.io, stateless and auto-scaling. GPU compute lives on RunPod, pay-per-second. Generated assets go to Supabase with signed URLs. Tailscale meshes it all together securely. What started as "let me generate some images" became a production distributed system because the alternative was a space heater that outputs nothing.
Now Claude can generate images, upscale them, run ControlNet pipelines, synthesize speech, and create lip-synced talking head videos through natural conversation. No per-image API fees and full parameter control over every step of the pipeline.
The next step is characters that speak: tutors with faces, historical figures who answer questions in their own voice. The infrastructure is functional. The applications are in development.
For Engineers
Architecture
MCP server exposes tools to Claude via the Model Context Protocol. Each tool builds a ComfyUI workflow graph dynamically: checkpoint loaders, CLIP encoders, samplers, VAE decoders. The assembled graph gets submitted to the remote GPU over Tailscale.
WebSocket monitoring tracks generation progress in real-time. When the image lands, the storage abstraction pushes it to Supabase, GCP, or local filesystem depending on configuration. Zero code changes to swap providers.
Long-running jobs (portrait generation, TTS, lipsync) run async through Quirrel job queues to avoid Fly.io connection limits. Six model strategies handle the prompting differences between Illustrious, Pony, Flux, SDXL, SD1.5, and Realistic checkpoints, auto-detected from the checkpoint filename.
Key Decisions
Distributed by Necessity
Local Mac rendered black frames with Sonic. The architecture emerged from hardware constraints, not overengineering. Now it scales.
Strategy Pattern for Model Prompting
Illustrious wants tags, Flux wants natural language, Pony needs score tags. Six model families, six strategies. Auto-detected from checkpoint name.
Cloud Storage Abstraction
Single interface, three implementations. Swap providers with an env var. No vendor lock-in.
Quirrel for Long-Running Jobs
Fly.io has connection limits. Portrait generation, TTS, and lipsync run async through job queues. Prevents timeout deaths.
What Was Hard
Distributed GPU inference sounds simple until you connect the pieces.
- Tailscale mesh between Fly.io and RunPod was underdocumented; getting the two to see each other required digging through both platforms' networking internals
- Rate limiting in a distributed context required Upstash Redis. In-memory limiters fail silently when you have multiple server instances
- The ComfyUI WebSocket protocol has quirks (binary frames, inconsistent event ordering) that took time to stabilize
Stack