Language & AI
The Feedback Loop That Makes Retrieval Learn
Most vector stores are static. You embed documents, build an index, and retrieval quality is fixed. The same query returns the same results on day one and day one thousand.
qortex adds a feedback loop. Every query can generate a signal. That signal adjusts edge weights in the knowledge graph. The next query for similar concepts benefits. Over time, retrieval shifts because the graph updated which paths it trusts, not because you re-embedded anything.
Three Components
A user asks her agent to find a good vet nearby: Time for the doggo’s follow-up for yeasty ears.
Last month, the agent sent her to The Vet, an army surplus store two blocks away. Being new to Austin, she was neither suspicious of the incorrect address…Nor amused at the circumstances.
She let the agent know, with words only humans know. The agent, to its credit, noticed, in the manner only machines understand: It adjusted two integers on the graph edges that led to its mistake result, weakening the path from “Vet” to “military retail” and strengthening the path to “animal care.”
Today? Same agent, same neighborhood, same question, but we’re headed to Riverside Animal Clinic.
No retraining. No re-embedding. Just good, old-fashioned, re-membering ...sorry. what’s wrong, and not repeating it.
The loop composes three systems:
- Thompson Sampling — how edges estimate their own reliability
- Edge Weight Adjustment — how feedback reaches the graph
- Personalized PageRank — how the graph decides what to surface
1. Thompson Sampling
An agent retrieves the wrong context and gives a bad answer; the user corrects it.
Next time the agent encounters a similar question in a similar context, it doesn’t make the same mistake. That’s learning: the system updated its own behavior from a signal it received during use, without retraining or human intervention.
Thompson Sampling is one way to make it work.
Interlude: Explore/Exploit
Getting to the vet’s not the only challenge of being in a new town. The other question: where’s worth it to eat?
We could do Chinese again. …Again. At least it’s reliable. Or. We could try the Indian spot that crawls all up our cribriform plate every day we drive by. One hits the spot; the other’s probably better. Maybe?…
The dilemma’s important enough that even mathematicians, often observed going days eating nothing but chalk and abstractions, decided to take a crack at solving it.
In Thompson Sampling terms: the Chinese place is Beta(14, 5). Fourteen good meals, five meh ones. You know what you’re getting; a solid 7/10. The Indian spot is Beta(1, 1). You’ve never been. Maximum uncertainty. Could be a 9. Could be a 4. The curve is completely flat because you have zero information.
The trick: instead of always picking the restaurant with the higher average (Chinese, forever), you sample from both distributions. The Chinese place reliably rolls a 6 or 7. The Indian place, because its curve is so wide, occasionally rolls a 9. On those nights, you try it. If it’s great, its curve tightens around “great.” If it’s terrible, its curve tightens around “terrible.” Either way, you learned something, and you didn’t need a rule that says “try something new 20% of the time.” The uncertainty itself was the exploration strategy.
Every connection in the graph works the same way. Each edge keeps a running tally: how many times has this path led somewhere useful, and how many times has it been a dead end? When the system needs to decide which paths to trust, it doesn’t just pick the highest-scoring one. It rolls the dice, weighted by what it knows. Paths with a long track record get predictable rolls. Paths the system hasn’t tried much get wild rolls, sometimes high, sometimes low, so they still get a chance to prove themselves.
Every edge in the knowledge graph encodes this tally as a Beta distribution, Beta(α, β). Alpha counts the wins. Beta counts the losses. Drag the sliders below and watch what happens: alpha (green) is the number of times feedback confirmed this edge was useful, beta (red) is the number of times it wasn’t.
Play with it — drag the sliders to see how the Beta distribution shifts as feedback accumulates.
New edges start at Beta(1, 1): both sliders at 1. That’s the restaurant you’ve never been to. The curve is flat; any score from 0 to 1 is equally plausible. The system shrugs.
Now slide alpha to 11 and beta to 3 (10 good experiences, 2 bad ones): the curve concentrates around 0.79. That’s the place you’ve been to a dozen times. You know what you’re getting. A dice roll from this distribution almost always comes up high, so the system will almost always trust this path.
The wild part: you never have to decide how adventurous to be. Flat curves naturally produce surprising rolls sometimes, so untested paths still get sampled. Well-known paths get predictable rolls. Exploration and exploitation fall out of the math without a tuning knob.
2. Edge Weight Adjustment
When feedback arrives for a query:
- Identify which concepts appeared in the result set
- Trace the graph edges that connected those concepts to the query
- Update the Beta distributions on those edges based on the outcome
This is lightweight. Two integer increments per edge per feedback event, sub-microsecond, and it doesn’t touch embeddings, the index, or the LLM.
The updated weights feed into the next stage…
3. Personalized PageRank
When a query arrives, qortex runs standard vector similarity to find seed concepts. Then it runs Personalized PageRank (PPR) starting from those seeds, walking typed edges weighted by the Beta-sampled confidences.
PPR is how the graph surfaces structurally relevant concepts that cosine similarity missed.
“How to implement enterprise SSO?” — cosine finds OpenID Connect (vocabulary match), but PPR also surfaces SAML (connected via
similar_to) and OAuth2 (connected viarefines). They don’t share enough vocabulary with the query for cosine to catch them.
The feedback loop makes PPR’s traversal more informed over time. If SAML results are consistently accepted for SSO-adjacent queries, the SAML → OpenID Connect edge strengthens. PPR allocates more weight to that path. Future queries surface SAML earlier.
In Practice
# First query — graph is fresh, Beta(1,1) everywhere
result = client.query("Enterprise SSO for corporate apps", domains=["auth"])
# Returns: SAML (0.85), OpenID Connect (0.92), OAuth2 (0.78), ...
# Agent uses the results, user provides feedback
client.feedback(
query_id=result.query_id,
outcomes={
"auth:saml": "accepted",
"auth:openid-connect": "accepted",
"auth:oauth2": "accepted",
"auth:kerberos": "rejected",
},
source="agent-pipeline",
)
# Edge weights updated:
# SAML → OpenID Connect (similar_to): Beta(1,1) → Beta(2,1)
# OpenID Connect → OAuth2 (refines): Beta(1,1) → Beta(2,1)
# Kerberos edges: Beta(1,1) → Beta(1,2)
The embeddings are unchanged. The vector index is unchanged. But the next similar query will traverse different paths through the graph, because the edge weights shifted.
Comparison With Existing Approaches
Static vector stores (ChromaDB, Pinecone, Qdrant, Weaviate) have no feedback mechanism. Retrieval quality is fixed at index time.
mem0 learns from conversation: entity recurrence strengthens connections. That’s about remembering what users said, a different problem from improving domain retrieval quality.
MemGPT/Letta has the agent use LLM calls to edit its own memory. Learning happens at the orchestration layer, not the retrieval layer. The archival search itself (vector similarity over pgvector) is static. Costs ~16,900 tokens per memory management cycle.
Microsoft GraphRAG runs an offline LLM pipeline to build community graphs. Static after construction, no runtime feedback. Designed for corpus summarization, not online agent retrieval.
LangGraph/LangChain provides state management and checkpointing. The underlying VectorStore doesn’t update from usage.
qortex targets the gap between all of these: runtime retrieval quality that changes from feedback, without re-embedding or LLM calls.
Convergence
The feedback loop’s value front-loads. The first 10 signals move edges from maximum uncertainty to reasonable confidence. By 50, the graph has a usable model of which paths matter. By 200, edge weights are well-separated.
This means deployment duration matters. A qortex graph running in production for months has retrieval characteristics that a fresh deployment can’t replicate, even with the same data and embeddings. Feedback history forms a moat you can’t reproduce by simply poaching users.
Whether this convergence property produces meaningfully better retrieval in practice, across diverse domains and at scale, is an open question. The mechanism is sound (Thompson Sampling on edge weights feeding PPR is a well-studied combination). The empirical evidence is early.
Vindler: The Production Loop
Vindler (the OpenClaw fork that serves as the qlawbox agent runtime) exercises this loop on every turn. The agent queries qortex, uses the results, and reports outcomes. The graph updates. The next query is informed by the last.
This is the primary surface where the feedback loop runs in practice.
Open Questions
The benchmarks in the framework adapters post measure a fresh graph with no feedback history. This is what you get from graph structure alone. The feedback loop should only improve from there.
We don’t yet have published numbers for retrieval quality as a function of feedback volume. That requires sustained production usage across multiple domains, and time. It’s what we’re building the evaluation harness for. Stay tuned.