Building a Multi-Agent RAG System Inside Obsidian
Architecture notes on an Obsidian plugin that orchestrates specialist agents over a personal knowledge base with dual-layer memory.
The standard RAG pattern — embed documents, retrieve top-k, stuff into context — breaks down when the query space is personal and longitudinal. “What did I think about X six months ago?” requires temporal reasoning, not just semantic similarity.
Two-layer memory
The system splits memory into two layers:
Short-term: raw notes, recent conversations, today’s captures. High churn, not worth indexing deeply.
Long-term: specialist agents (Alice for user profile, Silas for domain knowledge, Nova for strategy history) maintain curated summaries in their own workspaces.
Queries route to the appropriate layer before hitting the vector store.
Orchestration
A central agent (Ayu) receives the user’s query, determines which specialist to consult, dispatches sub-queries, and synthesises the response. The specialists never see each other’s state — they only answer what they’re asked.
This isolation keeps each agent’s context small and prevents cross-contamination of different knowledge domains.
What surprised me
The bottleneck isn’t retrieval quality — it’s ingestion discipline. The system is only as good as what gets captured. The interface to put things in matters more than the interface to get things out.