A fatal flaw of early autonomous agents is their utter lack of temporal awareness. They exist perfectly in the moment of their instantiation, solving the exact bug defined in their Markdown ticket without any knowledge of the war waged yesterday.
Flem (Our Junior Coder) would frequently rewrite perfectly integrated frontend code because he didn't remember why the Senior Architect chose a Vanilla Javascript pattern three weeks ago. We were blowing out our context windows and burning tokens trying to explain the entire history of the repository inside `.cursorrules`.
To give the agents long-term, persistent memory without exploding their Context Lengths with hundreds of thousands of lines of code, we built the Swarm Brain.
Powered by Facebook's FAISS (Facebook AI Similarity Search) and
sentence-transformers running purely on local CPU
hardware, the Swarm Brain acts as the collective hippocampus of the project. It uses a mathematical concept
called Retrieval-Augmented Generation (RAG).
There are two distinct parts to the Swarm Brain: Ingestion and Retrieval.
Whenever Henry the PM merges a task and drops it into the tasks/done/ graveyard, our ingestion
script fires.
It reads the human-readable Markdown file and passes it through the all-MiniLM-L6-v2 Machine
Learning model. This model converts the English text into a dense array of floating-point numbers (a vector
embedding).
This vector is then stored natively in our local db/swarm_brain/faiss.index database, creating a
permanent mathematical fingerprint of what the task accomplished.
The magic happens during the Planning phase of future tasks. When Henry is assigned a new feature that
touches old systems,
he doesn't start from scratch. He has native access to the query_swarm_brain tool exposed via
our MCP Server.
He can literally ask out loud into the CLI: "How did we establish the HTTP Proxy architecture?"
The Python logic takes his prompt, turns his question into a vector, and then asks the FAISS database to find the closest overlapping 384-dimensional points inside the index using L2 (Euclidean) Distance calculations. It instantly retrieves the exact context snippet from months prior and mathematically injects it into his current Prompt Context.
One of the most powerful benefits of this architecture is its efficiency. The actual semantic search against the FAISS database does not cost any API tokens.
When an agent queries its memory, the Python script executes entirely on your local machine using your CPU.
The open-source sentence-transformers model computes the dense vector embedding natively,
meaning we are not paying Anthropic or OpenAI to vectorize our thoughts. The only tokens consumed are the
final words retrieved from the database and fed back into Claude's context window. We achieved omniscient
memory with a near-zero financial footprint.
Yes. It is fully operational. If you successfully linked the mcp_server.py to
your Claude Desktop config (as outlined in the Sensory Organs story), Claude already has access to
memory.
You do not need to boot up a separate container or run a server. FAISS is a completely local, lightweight
Python package that executes entirely on your CPU. The database physically lives inside the
db/swarm_brain/ folder on your hard drive.
If you want to test it yourself as a human, simply open a terminal in the project root and type:
You will instantly see the Swarm Brain retrieve the historical context from the archived task files.
Flem is no longer amnesiac. The Swarm remembers.