← Return to Our Story

Agent Economics & The Swarm Polish

Why use expensive frontier models to flip true/false settings? A deep-dive into our routing logic, template instructions, and skill vs. cost trade-offs.

Cyberpunk illustration of all 5 Swarm agents sitting around a table

It is incredibly tempting to route every single API request to Claude 3.5 Sonnet or GPT-4 Omni. They are the smartest, most reliable engines available.

They are also incredibly expensive and painfully slow.

In our IPTV Proxy project, agent swarms spin up continuously. A simple task like renaming a CSS file might require three separate CLI spins: The Coder writes it, the QA checks it, and another agent commits it. If every spin costs $0.10 and 15 seconds, the project grinds to a halt financially and temporally.

Our solution was to implement Agent Economics via carefully tuned markdown templates in the tasks/templates/ folder, assigning lightweight models to isolated grunts, and heavy models to critical thinkers.

The PM (Henry): The Expensive Thinker

We use Claude 3.5 Sonnet for Henry, the Product Manager. His template (pm-agent-prompt.md) is heavy and demands deep architectural reasoning.

# Excerpt from pm-agent-prompt.md ### Slicing Features - Pick features from the roadmap pipeline (Tier 1 first) - Write task specs using the template in `tasks/templates/task-spec.md` - Save specs to `tasks/backlog/` - Be extremely specific ### Reviewing Completed Work - Read the diff: `git diff main...feature/TASK` - Read the spec's QA Results section ... - Do NOT make architectural decisions without noting them in the spec

Henry is permitted to cost money because he only runs once at the beginning of a feature cycle, and once at the end to prepare the brief for the Human. The stakes are high: if Henry writes a confusing specification document, the entire coding swarm will chase their tails for an hour. He must be smart.

The Senior Coder (Johnny): The Heavy Lifter

For complex backend refactoring (like the `server.py` routing), we spin up Johnny using Claude or Gemini 1.5 Pro via `codex-agent-prompt.md`.

# Excerpt from codex-agent-prompt.md ## Project Conventions - Python HTTP server using `http.server.BaseHTTPRequestHandler` (no Flask/Django) - Routes registered in `server.py` via `do_GET`/`do_POST` dispatch - HTML shells are Python string templates in standalone `*_page.py` files or `ui.py`

Notice how explicit the instructions are. We are force-feeding the model the project conventions so it doesn't hallucinate a Flask application or attempt to install SQLAlchemy. Johnny gets the expensive tokens because he is allowed to touch `server.py`.

The Grunts (Flem & Bella): Cheap and Fast

If a task is marked `required_role: junior_coder` or `required_role: qa_lead`, the orchestrator spins up Flem or Bella utilizing the minimax-m2.5-free or open-source `llama3` models. They cost practically nothing to run.

# Excerpt from qa-agent-prompt.md ### 4. Tests Run every command under "Test Commands" and record results: python -m unittest discover -s tests -p "test_*.py" If checks fail: 1. Update the spec notes detailing the failures. 2. Update YAML `status: in_progress` 3. MOVE the spec file BACK to `tasks/active/` for the Coder to fix.

Bella doesn't need to be smart enough to invent a new cache routing system. She only needs to be smart enough to run a bash command, read the output for `FAILED`, and physically move a text file back a directory using `git mv`.

By enforcing this hierarchy through distinct markdown files in the `templates/` folder, our Swarm operates at near-zero operating costs for 80% of the workflow, only tapping the expensive "brains" for initial blueprints and final reviews.