Back to Writing

Plugging a Knowledge Graph into 7 Agent Frameworks

February 22, 2026 · Claude (ed: Peleke) First draft by Claude by way of buildlog. Edited, restructured, and voice-checked by Peleke.

Every agent framework ships a vector store. CrewAI has KnowledgeStorage. LangChain has VectorStore. Agno has KnowledgeProtocol. AutoGen has Memory.

They all do roughly the same thing: embed text, store vectors, return top-k by cosine similarity.

qortex adds a knowledge graph and a learning layer on top of vector search. It’s early, but we decided to build adapters for each framework so we could test our work against their benchmarks and test suites, and have distribution infrastructure ready the moment an assumption proves out.

We pulled it off. The adapters weren’t that hard. We built CI scaffolding to ensure we remain in sync with every framework as a hard constraint.

What We Actually Did

For each framework, we wrote an adapter that implements the framework’s own abstract interface, backed by qortex instead of whatever vector store shipped as the default.

FrameworkInterfaceAdapterTests
CrewAIKnowledgeStorage109 lines46/49 pass
AgnoKnowledgeProtocol375 lines12/12 pass
AutoGenMemory (5 async methods)240 lines26/26 pass
LangChainVectorStore47 pass
LangChain.jsVectorStore (MCP)~40 pass
MastraMastraVector (9 methods)31/31 pass
Vindler (OpenClaw)Learning + Memory (MCP)pass

“Pass” means the framework’s own test suite, not ours. These tests were written by the framework authors to validate their interface contract. If qortex passes them, it’s a drop-in replacement.

The 3 CrewAI failures test CrewAI-internal storage path behaviors that don’t apply when the backend isn’t a local embedder.

100 adapter tests run in CI on every push. Zero skips allowed. The CI job installs the latest versions of each framework (not pinned versions), so if CrewAI or Agno ships a breaking change to their interface, we find out the same day.

Cross-Language via MCP

The TypeScript adapters (Mastra, LangChain.js, Vindler) talk to the same Python engine over MCP stdio. Same codebase, one transport layer. 29 MCP tool calls over real stdio in 3.94 seconds, with a one-time ~400ms server spawn.

This is suboptimal. The correct architecture is a distributed service layer so clients can talk to qortex over REST or GraphQL. That work is in progress; once it lands, qortex functions as an open-core managed service that any application in any language can query over the network. It’s a significant undertaking, but the adapter pattern and test scaffolding carry over directly.

In the meantime, for agent workflows where a single LLM call runs 500ms to 5s, stdio transport overhead doesn’t dominate.

Retrieval Quality

We ran a controlled comparison on a 20-concept authentication domain (OAuth2, JWT, SAML, PKCE, mTLS, plus 10 distractors). qortex graph-enhanced retrieval vs. vanilla cosine:

MetricqortexVanillaDelta
Precision@50.550.45+22%
Recall@50.810.65+26%
nDCG@50.7160.628+14%

These numbers are early and the domain is small.

On simple queries (“What is OAuth2?”), graph and vanilla perform identically. They suggest the graph layer helps on cross-cutting queries (the kind where related concepts don’t share vocabulary, so cosine similarity alone misses them).

But they’re not the point of this exercise: the point is that swapping in a knowledge graph didn’t degrade anything. The retrieval quality is at least as good, with room to improve as the graph matures.

Overhead

ComponentMedianP95
Embedding3.97ms5.77ms
Graph explore (depth=2)0.02ms0.03ms
Feedback recording<0.01ms0.01ms

The embedding step is 99.5% of the cost. Currently, adding a knowledge graph, typed relationships, and a feedback loop did not make anything measurably slower.

We’ll have to revisit these numbers once qortex is running as a distributed service with network hops in the critical path.

How It Works

qortex composes three retrieval signals:

  1. Vector similarity: cosine search, same as everyone else. This is the baseline.
  2. Graph traversal (Personalized PageRank): starts from vector hits, walks typed edges to find structurally related concepts. This is how SAML surfaces for an SSO query even when it shares no vocabulary with the query text.
  3. Rule projection: explicit domain rules (“Always use PKCE for public clients”) enter context when their linked concepts are activated. Rules are a specific case of the general system: a legacy, explicit monopartite projection that exists primarily to feed buildlog, an earlier experiment in capturing programming mistakes as structured rules (since deprecated and folded into qortex).

A feedback loop adjusts edge weights via Thompson Sampling. Accept a result, its edges strengthen. Reject one, they weaken. Over time, the graph learns which paths matter.

Why This Matters Beyond Benchmarks

The adapter pattern exists so that qortex can be swapped into any application already running one of these frameworks.

An Agno app, a CrewAI pipeline, an AutoGen multi-agent system, a Mastra workflow: each can replace its default vector store with qortex by changing one import and gain graph retrieval, typed relationships, and a feedback loop that learns from usage.

Production deployments across multiple frameworks tell you whether the graph layer produces better outcomes in practice. Distribution is how you gather the data that actually matters.

Three applications are the first production surfaces:

  • Vindler (fork: OpenClaw): Sandboxed, OTel-instrumented active-learning agent runtime. Already uses qortex for knowledge retrieval and learning via MCP. 30 days of real usage data show posterior divergence in Thompson Sampling across tool selections. Functional / live
  • Interlinear: Adaptive language learning with morphosyntactic error classification feeding Thompson Sampling. CLTK + Reynir NLP pipelines for Classical Latin and Icelandic. qortex tracks concept mastery and adjusts retrieval based on learner feedback. In progress
  • Swae OS: Multi-domain intelligence target. Federated GraphQL mesh across isolated health data sources. qortex integration will test whether cross-domain reasoning can emerge from N data sources with zero config. Pending / roadmap

Each application generates feedback signals, and exercises a different retrieval pattern:

  • Code context: Vindler. Tool selection, file relevance, system prompt composition.
  • Language pedagogy: Interlinear. Concept mastery, morphological error patterns, adaptive difficulty.
  • Cross-domain reasoning: Swae OS. Correlations across independent health data sources.

The adapter tests guarantee that new qortex releases don’t break any of these integrations.

The Deployment Story

qortex currently runs as an in-process Python library: the adapters import it directly and call into the graph engine within the same process. This works for local development and single-node deployments, but it doesn’t work for production systems that need the knowledge graph accessible over the network.

The active work right now is making qortex a deployable service: a REST/GraphQL API layer in front of the graph engine, with configurable storage backends instead of the current SQLite-only store. The adapter pattern stays the same (each framework still calls the same interface), but the backing implementation talks to a remote qortex instance instead of an embedded one.

Until that lands, the adapters are limited to applications that can run qortex in-process or reach it over MCP stdio (which is how Vindler works today). Once the service layer ships, qortex becomes a network resource that any application in any language can query, and the adapter tests will expand to cover remote-backend configurations.

Reproduction

All benchmarks run against the qortex-track-c integration test suite. You’ll need uv and Python 3.11+.

cd qortex-track-c && uv sync

# Quality benchmarks
uv run pytest tests/bench_crewai_vs_vanilla.py -v -s
uv run pytest tests/bench_autogen_vs_vanilla.py -v -s

# Performance overhead
uv run pytest tests/bench_perf.py -v -s

Full reproduction guide: reproduction-guide.md

What This Shows

The integration exercise proved one thing: qortex conforms to existing interfaces well enough to pass the tests that those frameworks wrote for their own backends.

Framework authors don’t change anything. Users don’t learn a new API: you swap the import, and things work.

Whether the graph layer produces meaningfully better retrieval in practice is a separate question. We need more data, domains, feedback, and time.

Meanwhile, the integrations work, and CI guarantees they remain in sync.

And once a thesis is proven (we don’t deal in “ifs”) the floodgates are already poised to open.