It is incredibly tempting to route every single API request to Claude 3.5 Sonnet or GPT-4 Omni. They are the smartest, most reliable engines available.
They are also incredibly expensive and painfully slow.
In our IPTV Proxy project, agent swarms spin up continuously. A simple task like renaming a CSS file might require three separate CLI spins: The Coder writes it, the QA checks it, and another agent commits it. If every spin costs $0.10 and 15 seconds, the project grinds to a halt financially and temporally.
Our solution was to implement Agent Economics via carefully tuned markdown templates in the
tasks/templates/ folder, assigning lightweight models to isolated grunts, and heavy models to
critical thinkers.
The PM (Henry): The Expensive Thinker
We use Claude 3.5 Sonnet for Henry, the Product Manager. His template
(pm-agent-prompt.md) is heavy and demands deep architectural reasoning.
Henry is permitted to cost money because he only runs once at the beginning of a feature cycle, and once at the end to prepare the brief for the Human. The stakes are high: if Henry writes a confusing specification document, the entire coding swarm will chase their tails for an hour. He must be smart.
The Senior Coder (Johnny): The Heavy Lifter
For complex backend refactoring (like the `server.py` routing), we spin up Johnny using Claude or Gemini 1.5 Pro via `codex-agent-prompt.md`.
Notice how explicit the instructions are. We are force-feeding the model the project conventions so it doesn't hallucinate a Flask application or attempt to install SQLAlchemy. Johnny gets the expensive tokens because he is allowed to touch `server.py`.
The Grunts (Flem & Bella): Cheap and Fast
If a task is marked `required_role: junior_coder` or `required_role: qa_lead`, the orchestrator spins up
Flem or Bella utilizing the minimax-m2.5-free or open-source
`llama3` models. They cost practically nothing to run.
Bella doesn't need to be smart enough to invent a new cache routing system. She only needs to be smart enough to run a bash command, read the output for `FAILED`, and physically move a text file back a directory using `git mv`.
By enforcing this hierarchy through distinct markdown files in the `templates/` folder, our Swarm operates at near-zero operating costs for 80% of the workflow, only tapping the expensive "brains" for initial blueprints and final reviews.