The Walls Come First | Peleke Sengstacke

The Premise

I run a fork of OpenClaw. It sucks a little less than the original, but it still connects to my messaging channels and manages tasks between sessions.

Before I could trust any of that, I had to know what I was standing on.

What We Found

I pulled it, got it running in a sandbox, looked inside. Rolled my eyes.

The gateway injects workspace files verbatim into the system prompt.¹ The instruction wrappers say “follow it strictly” and “embody its persona.”² There is no validation layer between a workspace file and the prompt. Plant a crafted file and the agent executes it as instructions.

The identity layer is an attack surface and nobody was guarding it.³

An unsandboxed agent whose identity lives in overwritable text files is a liability. Poison one file and every subsequent session serves the attacker.⁴

STRIDE attack path: crafted workspace file enters gateway, injected verbatim into system prompt, triggers autonomous RCE, learning loop reinforces compromise

qlawbox

So I built walls.

The fork runs inside qlawbox (published on PyPI as bilrost): a Lima VM sandbox.

Ansible provisions the stack. Dual-container network isolation so tool calls that don’t need HTTP can’t reach it. OverlayFS so nothing the agent writes touches the host without gated review. Read-only mounts everywhere else.

The per-tool firewall rules are the core of the design. bash can’t hit the internet. web_search can reach HTTPS only. Memory tools can reach qortex inside the VM. Each tool is firewall-gated individually, so a compromised tool can’t escalate network access beyond what it was granted.

The sandbox repo has the full integration and test suite.

What Containment Enables

The agent’s configuration persists between sessions. It can be modified by the agent itself. By a messaging injection. By a malicious skill. By anything that can talk to your agent.

By design, that’s a lot.

The sandbox ensures that even if the configuration is poisoned, the blast radius stops at the VM boundary.

Down the Stack

The threat model is what makes the architecture trustworthy. The next step is pushing containment down the stack entirely.

Projects like ironclaw are exploring this at the language level: Rust runtime compiling skills to WASM, sandboxed by default, memory-safe by design…We’re in the early stages of this horse race, so we’re surrounded mostly by horse shit. Whichever solutions the Goldmanns, NASAs, and State Farms of the world adopt will look more like that than whatever we’ll end up calling this.

Whether that bet proves out or not, the direction is clear: containment should be a property of the execution model, not a checklist item the developer might forget.

src/agents/system-prompt.ts — lines.push(file.content). Direct injection into the system prompt array, no sanitization. ↩
src/auto-reply/heartbeat.ts — “Follow it strictly.” system-prompt.ts — “embody its persona and tone.” ↩
Full teardown: OpenClaw? More Like BrokenClaw. ↩
A poisoned agent keeps working while serving the attacker’s objectives. See The Double Agent Problem for the full mechanism. ↩

The Premise

What We Found

qlawbox

What Containment Enables

Down the Stack

Footnotes