16,000 Lines of Wrong | Peleke Sengstacke

This is part of a series on building agents that learn.

The Scope

buildlog was the exploration.

16,000 lines of Python. The project started as a question: what if you could capture every engineering decision you make, extract patterns from the record, and enforce the good ones automatically? A full pipeline: structured capture of engineering decisions and bandit-based enforcement of extracted patterns via git and Claude hooks.

The idea was right: I use it every day. Most of the implementation was…Experimental. But wrong code is only wasted if you don’t learn from where it breaks.

What Survived

Three things came out of it.

The buildlog pipeline: Capture and Enforce survived (entries, gauntlet). Extract and Select failed (LLM rules, LLM bandits). The middle two moved to qortex.

1. The Gauntlet Was Right

The gauntlet survived. A rule-based review system where custom rules gate PRs and commits. No LLM in the loop. You define rules; the rules check your code; the rules block or pass.

The gauntlet works because it’s boring. Rules are explicit and deterministic. You can read them and test them.

It runs in production and enforces the learning loop mechanically via git hooks: every commit passes through the gauntlet before it lands.

That piece was always the simplest part of the system. Turns out the simplest part was the only part that was right. The gauntlet still runs. The template is at v0.20.0 and still shipping updates.

2. Rule Derivation Was Wrong

The hard part: taking raw decision-outcome pairs, extracting patterns, then deciding which ones earn their token cost in the next session.

No sh*t: this is f*cking hard, and no one is doing it .

The LLM-as-judge instrumentation was a distraction. Routing every commit through a language model to “evaluate quality” sounded good in the design doc. In practice, it added latency to every commit and produced ratings too inconsistent to learn from .

Automated seed extraction was the second mistake. buildlog tried to extract “engineering patterns” from journal entries using LLM summarization. The results were plausible-sounding rules that hadn’t been validated against real outcomes. Plausible and correct are different things.

Both features shared the same flaw: they used LLM calls where structured data would have been cheaper and more reliable.

buildlog vs qortex: buildlog used LLM-as-judge and LLM summarization, producing plausible but unvalidated rules. qortex uses Thompson Sampling and structured decision-outcome pairs, producing validated patterns from real usage.

The qortex architecture exists because of what buildlog got wrong. Thompson Sampling replaced LLM-based evaluation. Instead of LLM-summarized patterns, we used structured decision-outcome pairs. The flat rule store gave way to a knowledge graph. Each correction came from watching buildlog fail at a specific task and engineering a solution to something the LLM had previously simply “filled in.”

Make It Look Easy covers the protocol architecture that emerged from this process.

3. The Entries Were Right

The readable logs survived. buildlog’s journal entries, structured markdown capturing what happened and why, turned out to be the single output worth keeping from the entire project.

They fed us, not the extraction pipeline.

Forty-two entries over four months, and dozens of them became published articles.

The entries are what taught us to write about this work. This very article exists because buildlog made “write down what you did and what you learned” a mechanical habit enforced by git hooks. Once the entries existed, the writing followed.

That pipeline kept going:

DistributionVindler Finds a Voice

Engineering logs became drafts, drafts became posts, posts became a distribution channel.

SeriesInside the Machine

Six articles on what happens when you take an agent framework apart.

We didn’t stop running it. The friction for logging build journals was low enough that it scaled: entries became blog posts, blog posts became fodder for LinWheel distribution. No step requires starting from scratch.

The Shape of the Thing

Capture decisions, make them readable, let the writing emerge from the record.

The idea was right. Most of the implementation was wrong. Wrong code teaches you the right abstractions if you pay attention to where it breaks.