Plugging a Knowledge Graph into 7 Agent Frameworks

Every agent framework ships a vector store. CrewAI has KnowledgeStorage. LangChain has VectorStore. Agno has KnowledgeProtocol. AutoGen has Memory.

They all do roughly the same thing: embed text, store vectors, return top-k by cosine similarity.

qortex adds a knowledge graph and a learning layer on top of vector search. We built adapters for each framework so we could test our work against their benchmarks and test suites, and have distribution infrastructure ready the moment an assumption proves out.

qortex adapter pattern: 7 framework adapters radiating from qortex core, each implementing the framework's native interface and passing its own test suite

What We Actually Did

For each framework, we wrote an adapter that implements the framework’s own abstract interface, backed by qortex instead of whatever vector store shipped as the default.

Framework	Interface	Adapter	Tests
CrewAI	`KnowledgeStorage`	109 lines	46/49 pass
Agno	`KnowledgeProtocol`	375 lines	12/12 pass
AutoGen	`Memory` (5 async methods)	240 lines	26/26 pass
LangChain	`VectorStore`	—	47 pass
LangChain.js	`VectorStore` (MCP)	—	~40 pass
Mastra	`MastraVector` (9 methods)	—	31/31 pass
Vindler (OpenClaw)	Learning + Memory (MCP)	—	pass

“Pass” means the framework’s own test suite, not ours. These tests were written by the framework authors to validate their interface contract. If qortex passes them, it’s a drop-in replacement.

The 3 CrewAI failures test CrewAI-internal storage path behaviors that don’t apply when the backend isn’t a local embedder. CrewAI recently replaced its DB layer; the adapter still passes because it targets the published KnowledgeStorage interface contract, not the internal storage implementation.
100 adapter tests run in CI on every push. Zero skips allowed.
The CI job installs the latest versions of each framework (not pinned versions), so if CrewAI or Agno ships a breaking change to their interface, we find out the same day.

Cross-Language via MCP

The TypeScript adapters (Mastra, LangChain.js, Vindler) talk to the same Python engine over MCP stdio. Same codebase, one transport layer. 29 MCP tool calls over real stdio in 3.94 seconds, with a one-time ~400ms server spawn.

This is no longer the only option. qortex now ships a REST API (qortex serve): a Starlette ASGI server with 35+ endpoints, API key and HMAC-SHA256 auth.

The async HttpQortexClient speaks the same protocol interface as the local client. Any application in any language can query qortex over the network; the adapter pattern and test scaffolding carry over directly.

For agent workflows where a single LLM call runs 500ms to 5s, stdio transport overhead doesn’t dominate either way. But for cross-language consumers (like Next.js apps), the REST API eliminates subprocess management entirely.

Retrieval Quality

We ran a controlled comparison on a 20-concept authentication domain (OAuth2, JWT, SAML, PKCE, mTLS, plus 10 distractors). qortex graph-enhanced retrieval vs. vanilla cosine:

Metric	qortex	Vanilla	Delta
Precision@5	0.55	0.45	+22%
Recall@5	0.81	0.65	+26%
nDCG@5	0.716	0.628	+14%

On simple queries (“What is OAuth2?”), graph and vanilla perform identically. They suggest the graph layer helps on cross-cutting queries (the kind where related concepts don’t share vocabulary, so cosine similarity alone misses them), but it’s disingenuous to claim a test at this scale indicates a meaningful effect.

But that’s not the point of this exercise: the point is that swapping in a knowledge graph didn’t degrade anything. The retrieval quality is at least as good, with room to improve as the graph matures.

Overhead

Component	Median	P95
Embedding	3.97ms	5.77ms
Graph explore (depth=2)	0.02ms	0.03ms
Feedback recording	<0.01ms	0.01ms

The embedding step is 99.5% of the cost. Currently, adding a knowledge graph, typed relationships, and a feedback loop did not make anything measurably slower.

Now that qortex runs as a distributed service with network hops in the critical path, these numbers will shift: benchmarks against the REST API are forthcoming.

How It Works

qortex composes three retrieval signals:

Vector similarity: cosine search, same as everyone else. This is the baseline.
Graph traversal (Personalized PageRank): starts from vector hits, walks typed edges to find structurally related concepts. This is how SAML surfaces for an SSO query even when it shares no vocabulary with the query text.
Rule projection: explicit domain rules (“Always use PKCE for public clients”) enter context when their linked concepts are activated. Rules are a specific case of the general system: a legacy, explicit monopartite projection that exists primarily to feed buildlog, an earlier experiment in capturing programming mistakes as structured rules (since deprecated and folded into qortex).

A feedback loop adjusts edge weights via Thompson Sampling. When a result is accepted, the edges that led to it get a small boost. When it’s rejected, they get penalized. Over time, the paths that consistently lead to good results pull ahead.

Our hope is that the composition of these layers (vector similarity, graph traversal, feedback-driven weight adjustment) is general enough to support a range of cognitive architectures, and that the learning signal across domains gives us the tooling to derive them on the fly. We’re not sure yet. That’s what we’re testing.

Why This Matters Beyond Benchmarks

The adapter pattern exists so that qortex can be swapped into any application already running one of these frameworks. This isn’t something you just “get for free”.

Any HTTP client can query qortex through the REST API. Any framework in Python can run it as a drop-in replacement for their existing vector stores.

The real test isn’t benchmarks; it’s production. Multiple frameworks, multiple domains, real users generating real feedback. That’s how you find out whether the graph layer actually earns its keep.

Three applications are the first production consumers:

LiveVindler

Sandboxed, OTel-instrumented active-learning agent runtime. Uses qortex for knowledge retrieval and learning via MCP.

Code context: tool selection, file relevance, system prompt composition

In ProgressInterlinear

Adaptive language learning with morphosyntactic error classification feeding Thompson Sampling. CLTK + Reynir NLP pipelines for Classical Latin and Icelandic. qortex tracks concept mastery and adjusts retrieval based on learner feedback.

Language pedagogy: concept mastery, morphological error patterns, adaptive difficulty

RoadmapMindMirror

Federated GraphQL mesh across isolated health data sources. qortex integration will test whether cross-domain reasoning can emerge from N data sources with zero config.

Cross-domain reasoning: correlations across independent health data sources

Each application generates feedback signals by exercising a different retrieval pattern.

Adapter tests in CI guarantee we’ll know ASAP when the integrations break.

Reproduction

All benchmarks run against the qortex-track-c integration test suite. You’ll need uv and Python 3.11+.

cd qortex-track-c && uv sync

# Quality benchmarks
uv run pytest tests/bench_crewai_vs_vanilla.py -v -s
uv run pytest tests/bench_autogen_vs_vanilla.py -v -s

# Performance overhead
uv run pytest tests/bench_perf.py -v -s

Full reproduction guide: reproduction-guide.md

What This Shows

qortex conforms to existing interfaces well enough to pass the tests that those frameworks wrote for their own backends. Swapping the import is the only change required.

Whether the graph layer produces meaningfully better retrieval in practice is a separate question. We need more data, more domains, more feedback. CI guarantees the integrations remain in sync while we collect it.