Back to Writing

Agents Don't Learn

February 23, 2026

Every agent on the market wakes up blank.

Your $200/month Claude subscription. Your custom GPT. Your Cursor workspace with 47 rules files you hand-wrote because the machine can’t remember what you told it yesterday. Every single one of these systems starts each session at zero.

You are the memory. You are the learning loop.

You re-explain your codebase, your preferences, your conventions, your past mistakes, every time, from scratch.

The default is still stateless. The session-model architecture has no feedback path; every major agent framework treats this as a feature gap rather than the structural failure it is.

Learning means changing what you do based on what happened last time. Instead, we got “memory” products: RAG stapled to a chatbot. RAG stapled to a chatbot does not learn. Adding more storage does not change that.

Does the system make better decisions after 100 sessions than after 1? The answer, everywhere, is no.

The cost is compound: every session that starts at zero re-introduces mistakes that a learning system would have caught by session five. The agent makes the same wrong tool selection and hits the same context ceiling. It burns the same tokens on the same dead ends. You absorb the cost in review time. Your team absorbs it in velocity.

We’ve been building a feedback loop that tunes tool selection, context injection, and skill activation based on session outcomes. The series below is the record: what we tried, what broke, and what survived.

The Series

Everyone’s hand-editing CLAUDE.md and calling it engineering. Real context engineering treats the prompt as a policy and optimizes it with feedback.

Thompson Sampling over tool selection. Three arm types competing for a finite token budget. The POMDP framing. The TensorZero comparison. Why nobody is engineering the observation function.

If your agent learns its own context, it must be contained. STRIDE on the identity layer, per-tool network isolation, OverlayFS, gitleaks sync gate. The sandbox makes the learning loop safe to deploy.

buildlog was the exploration. 16,000 lines of Python, most of it wrong. The gauntlet survived. The entries became articles. The failures became qortex.

Your agent wakes up blank tomorrow. The question is whether it remembers anything from today.