The Feedback Loop That Makes Retrieval Learn

Most vector stores are static. You embed documents, build an index, and retrieval quality is fixed. The same query returns the same results on day one and day one thousand.

qortex adds a feedback loop. Every query can generate a signal. That signal adjusts edge weights in the knowledge graph. The next query for similar concepts benefits.

Over time, retrieval shifts because the graph updated which paths it trusts, not because you re-embedded anything.

Diagram of the qortex feedback loop: query arrives, Personalized PageRank walks the graph, results returned, feedback signal (accept/reject), edge weights updated via Beta distribution, next query benefits

Three Components

A user asks her agent to find a good vet nearby: Time for the doggo’s follow-up for yeasty ears.

Last month, the agent sent her to The Vet, an army surplus store two blocks away. Being new to Austin, she was neither suspicious of the incorrect address…Nor amused at the circumstances.

She let the agent know, with words only humans know. The agent, to its credit, noticed, in the manner only machines understand: It adjusted two integers on the graph edges that led to its mistake result, weakening the path from “Vet” to “military retail” and strengthening the path to “animal care.”

Today? Same agent, same neighborhood, same question, but we’re headed to Riverside Animal Clinic.

The system re-membered ...sorry. what went wrong and stopped repeating it. No retraining, no re-embedding.

The loop composes three systems:

Thompson Sampling: how edges estimate their own reliability
Edge Weight Adjustment: how feedback reaches the graph
Personalized PageRank: how the graph decides what to surface

1. Thompson Sampling

An agent retrieves the wrong context and gives a bad answer; the user corrects it.

Next time the agent encounters a similar question in a similar context, it doesn’t make the same mistake. That’s learning: the system updated its own behavior from a signal it received during use, without retraining or human intervention.

Thompson Sampling is one way to make it work.

Interlude: Explore/Exploit

Getting to the vet’s not the only challenge of being in a new town. The other question: where’s worth it to eat?

We could do Chinese again. …Again. At least it’s reliable. Or. We could try the Indian spot that crawls all up our cribriform plate every day we drive by. One hits the spot; the other’s probably better. Maybe?…

The dilemma’s important enough that even mathematicians, often observed going days eating nothing but chalk and abstractions, decided to take a crack at solving it.

In Thompson Sampling terms: the Chinese place is Beta(14, 5). Fourteen good meals, five meh ones. You know what you’re getting; a solid 7/10. The Indian spot is Beta(1, 1). You’ve never been. Maximum uncertainty. Could be a 9. Could be a 4. The curve is completely flat because you have zero information.

The trick: instead of always picking the restaurant with the higher average (Chinese, forever), you sample from both distributions. The Chinese place reliably rolls a 6 or 7. The Indian place, because its curve is so wide, occasionally rolls a 9. On those nights, you try it. If it’s great, its curve tightens around “great.” If it’s terrible, its curve tightens around “terrible.”

Either way, you learned something, and you didn’t need a rule that says “try something new 20% of the time.” The uncertainty itself was the exploration strategy.

Every connection in the graph works the same way. Each edge keeps a running tally: how many times has this path led somewhere useful, and how many times has it been a dead end?

When the system needs to decide which paths to trust, it doesn’t just pick the highest-scoring one. It rolls the dice, weighted by what it knows. Paths with a long track record get predictable rolls. Paths the system hasn’t tried much get wild rolls (sometimes high, sometimes low), so they still get a chance to prove themselves.

Every edge in the knowledge graph encodes this tally as a Beta distribution, Beta(α, β). Alpha counts the wins. Beta counts the losses. Drag the sliders below and watch what happens: alpha (green) is the number of times feedback confirmed this edge was useful, beta (red) is the number of times it wasn’t.

Play with it. Drag the sliders to see how the Beta distribution shifts as feedback accumulates.

α (accepts):1β (rejects):1mean: 0.50 · variance: high

New edges start at Beta(1, 1): both sliders at 1. That’s the restaurant you’ve never been to. The curve is flat; any score from 0 to 1 is equally plausible. The system shrugs.

Now slide alpha to 11 and beta to 3 (10 good experiences, 2 bad ones): the curve concentrates around 0.79. That’s the place you’ve been to a dozen times. You know what you’re getting. A dice roll from this distribution almost always comes up high, so the system will almost always trust this path.

You never have to decide how adventurous to be. Flat curves naturally produce surprising rolls sometimes, so untested paths still get sampled. Well-known paths get predictable rolls. Exploration and exploitation fall out of the math without a tuning knob.

2. Edge Weight Adjustment

When feedback arrives for a query:

Identify which concepts appeared in the result set
Trace the graph edges that connected those concepts to the query
Update the Beta distributions on those edges based on the outcome

This is lightweight. Two integer increments per edge per feedback event, sub-microsecond, and it doesn’t touch embeddings, the index, or the LLM.

The updated weights feed into the next stage…

3. Personalized PageRank

When a query arrives, qortex runs standard vector similarity to find seed concepts. Then it runs Personalized PageRank (PPR) starting from those seeds, walking typed edges weighted by the Beta-sampled confidences.

Animated diagram of Personalized PageRank traversing a knowledge graph: query node activates seed nodes via cosine match, then PPR walks typed edges (similar_to, refines, depends_on) to discover structurally related concepts at 1-hop and 2-hop distances

PPR is how the graph surfaces structurally relevant concepts that cosine similarity missed.

“How to implement enterprise SSO?” Cosine finds OpenID Connect (vocabulary match), but PPR also surfaces SAML (connected via similar_to) and OAuth2 (connected via refines). They don’t share enough vocabulary with the query for cosine to catch them.

The feedback loop makes PPR’s traversal more informed over time. If SAML results are consistently accepted for SSO-adjacent queries, the SAML → OpenID Connect edge strengthens. PPR allocates more weight to that path. Future queries surface SAML earlier.

In Practice

# First query: graph is fresh, Beta(1,1) everywhere
result = client.query("Enterprise SSO for corporate apps", domains=["auth"])
# Returns: SAML (0.85), OpenID Connect (0.92), OAuth2 (0.78), ...

# Agent uses the results, user provides feedback
client.feedback(
    query_id=result.query_id,
    outcomes={
        "auth:saml": "accepted",
        "auth:openid-connect": "accepted",
        "auth:oauth2": "accepted",
        "auth:kerberos": "rejected",
    },
    source="agent-pipeline",
)

# Edge weights updated:
# SAML → OpenID Connect (similar_to): Beta(1,1) → Beta(2,1)
# OpenID Connect → OAuth2 (refines): Beta(1,1) → Beta(2,1)
# Kerberos edges: Beta(1,1) → Beta(1,2)

The next similar query will traverse different paths through the graph, because the edge weights shifted.

Comparison With Existing Approaches

Static vector stores (ChromaDB, Pinecone, Qdrant, Weaviate) have no feedback mechanism. Retrieval quality is fixed at index time.

mem0 learns from conversation: entity recurrence strengthens connections. That’s about remembering what users said, a different problem from improving domain retrieval quality.

MemGPT/Letta has the agent use LLM calls to edit its own memory. Learning happens at the orchestration layer, not the retrieval layer. The archival search itself (vector similarity over pgvector) is static. Costs ~16,900 tokens per memory management cycle.

Microsoft GraphRAG runs an offline LLM pipeline to build community graphs. Static after construction, no runtime feedback. Designed for corpus summarization, not online agent retrieval.

LangGraph/LangChain provides state management and checkpointing. The underlying VectorStore doesn’t update from usage.

qortex targets the gap between all of these: runtime retrieval quality that changes from feedback, without re-embedding or LLM calls.

Convergence

The feedback loop’s value front-loads. The first 10 signals move edges from maximum uncertainty to reasonable confidence. By 50, the graph has a usable model of which paths matter. By 200, edge weights are well-separated.

This means deployment duration matters. A qortex graph running in production for months has retrieval characteristics that a fresh deployment can’t replicate, even with the same data and embeddings. The feedback history compounds.

Whether this convergence property produces meaningfully better retrieval in practice, across diverse domains and at scale, is an open question. The empirical evidence is early.

The Production Loop

Two applications exercise this loop today. Vindler (the sandboxed agent runtime) queries qortex on every turn, uses the results, reports outcomes, and the graph updates accordingly. Interlinear (adaptive language learning) will use the same mechanism to model concept mastery: morphosyntactic errors feed rejection signals, correct recall feeds acceptance signals, and the graph learns which pedagogical paths work for each learner.

2,041 tests pass across the qortex suite. 933 pass across Vindler. The feedback loop runs in production on real interactions, not just benchmarks.

Open Questions

The benchmarks in the framework adapters post measure a fresh graph with no feedback history.

We don’t have enough data to quantify retrieval quality as a function of feedback volume in a meaningful manner. That’s what the distributed layer is for.