← Back to Story Hub

The Swarm Brain (RAG)

Curing Agentic Amnesia with FAISS Vector Databases and Semantic Memory.

A fatal flaw of early autonomous agents is their utter lack of temporal awareness. They exist perfectly in the moment of their instantiation, solving the exact bug defined in their Markdown ticket without any knowledge of the war waged yesterday.

Flem (Our Junior Coder) would frequently rewrite perfectly integrated frontend code because he didn't remember why the Senior Architect chose a Vanilla Javascript pattern three weeks ago. We were blowing out our context windows and burning tokens trying to explain the entire history of the repository inside `.cursorrules`.

The Cure: Retrieval-Augmented Generation

To give the agents long-term, persistent memory without exploding their Context Lengths with hundreds of thousands of lines of code, we built the Swarm Brain.

Powered by Facebook's FAISS (Facebook AI Similarity Search) and sentence-transformers running purely on local CPU hardware, the Swarm Brain acts as the collective hippocampus of the project. It uses a mathematical concept called Retrieval-Augmented Generation (RAG).

"When an agent closes a ticket, it is no longer just archived in a text file. It is mathematically vectorized into 384-dimensional space, permanently embedding the context, the code diff, and the architectural rationale into a searchable void."

How It Works: the FAISS Pipeline

There are two distinct parts to the Swarm Brain: Ingestion and Retrieval.

1. Ingestion (`vectorize_task.py`)

Whenever Henry the PM merges a task and drops it into the tasks/done/ graveyard, our ingestion script fires.

It reads the human-readable Markdown file and passes it through the all-MiniLM-L6-v2 Machine Learning model. This model converts the English text into a dense array of floating-point numbers (a vector embedding).

This vector is then stored natively in our local db/swarm_brain/faiss.index database, creating a permanent mathematical fingerprint of what the task accomplished.

2. Retrieval (`query_brain.py`)

The magic happens during the Planning phase of future tasks. When Henry is assigned a new feature that touches old systems, he doesn't start from scratch. He has native access to the query_swarm_brain tool exposed via our MCP Server.

He can literally ask out loud into the CLI: "How did we establish the HTTP Proxy architecture?"

The Python logic takes his prompt, turns his question into a vector, and then asks the FAISS database to find the closest overlapping 384-dimensional points inside the index using L2 (Euclidean) Distance calculations. It instantly retrieves the exact context snippet from months prior and mathematically injects it into his current Prompt Context.

# The agent executes: python scripts/query_brain.py "How did we fix the VOD playback regression?" # The Swarm Brain returns: === SWARM BRAIN SEARCH RESULTS === --- MATCH 1 | Source: TASK-F-56-vod-playback-regression.md | Distance (L2): 0.8124 --- [The exact code snippet showing how we fixed the regex playlist parser]

The Economic Benefit: Zero-Cost Intelligence

One of the most powerful benefits of this architecture is its efficiency. The actual semantic search against the FAISS database does not cost any API tokens.

When an agent queries its memory, the Python script executes entirely on your local machine using your CPU. The open-source sentence-transformers model computes the dense vector embedding natively, meaning we are not paying Anthropic or OpenAI to vectorize our thoughts. The only tokens consumed are the final words retrieved from the database and fed back into Claude's context window. We achieved omniscient memory with a near-zero financial footprint.

Practical Usage: Is it live?

Yes. It is fully operational. If you successfully linked the mcp_server.py to your Claude Desktop config (as outlined in the Sensory Organs story), Claude already has access to memory.

You do not need to boot up a separate container or run a server. FAISS is a completely local, lightweight Python package that executes entirely on your CPU. The database physically lives inside the db/swarm_brain/ folder on your hard drive.

If you want to test it yourself as a human, simply open a terminal in the project root and type:

python scripts/query_brain.py "What is the token burn problem?"

You will instantly see the Swarm Brain retrieve the historical context from the archived task files.

Flem is no longer amnesiac. The Swarm remembers.