Agent Economics and the Swarm Polish

Cyberpunk illustration of all 5 Swarm agents sitting around a table

It is incredibly tempting to route every single API request to Claude 3.5 Sonnet or GPT-4 Omni. They are the smartest, most reliable engines available.

They are also incredibly expensive and painfully slow.

In our IPTV Proxy project, agent swarms spin up continuously. A simple task like renaming a CSS file might require three separate CLI spins: The Coder writes it, the QA checks it, and another agent commits it. If every spin costs $0.10 and 15 seconds, the project grinds to a halt financially and temporally.

Our solution was to implement Agent Economics via carefully tuned markdown templates in the tasks/templates/ folder, assigning lightweight models to isolated grunts, and heavy models to critical thinkers.

The PM (Henry): The Expensive Thinker

We use Claude 3.5 Sonnet for Henry, the Product Manager. His template (pm-agent-prompt.md) is heavy and demands deep architectural reasoning.

            # Excerpt from pm-agent-prompt.md
            ### Slicing Features
            - Pick features from the roadmap pipeline (Tier 1 first)
            - Write task specs using the template in `tasks/templates/task-spec.md`
            - Save specs to `tasks/backlog/`
            - Be extremely specific

            ### Reviewing Completed Work
            - Read the diff: `git diff main...feature/TASK`
            - Read the spec's QA Results section
            ...
            - Do NOT make architectural decisions without noting them in the spec
        

Henry is permitted to cost money because he only runs once at the beginning of a feature cycle, and once at the end to prepare the brief for the Human. The stakes are high: if Henry writes a confusing specification document, the entire coding swarm will chase their tails for an hour. He must be smart.

The Senior Coder (Johnny): The Heavy Lifter

For complex backend refactoring (like the `server.py` routing), we spin up Johnny using Claude or Gemini 1.5 Pro via `codex-agent-prompt.md`.

            # Excerpt from codex-agent-prompt.md
            ## Project Conventions
            - Python HTTP server using `http.server.BaseHTTPRequestHandler` (no Flask/Django)
            - Routes registered in `server.py` via `do_GET`/`do_POST` dispatch
            - HTML shells are Python string templates in standalone `*_page.py` files or `ui.py`
        

Notice how explicit the instructions are. We are force-feeding the model the project conventions so it doesn't hallucinate a Flask application or attempt to install SQLAlchemy. Johnny gets the expensive tokens because he is allowed to touch `server.py`.

The Grunts (Flem & Bella): Cheap and Fast

If a task is marked `required_role: junior_coder` or `required_role: qa_lead`, the orchestrator spins up Flem or Bella utilizing the minimax-m2.5-free or open-source `llama3` models. They cost practically nothing to run.

            # Excerpt from qa-agent-prompt.md
            ### 4. Tests
            Run every command under "Test Commands" and record results:
            python -m unittest discover -s tests -p "test_*.py"

            If checks fail:
            1. Update the spec notes detailing the failures.
            2. Update YAML `status: in_progress`
            3. MOVE the spec file BACK to `tasks/active/` for the Coder to fix.
        

Bella doesn't need to be smart enough to invent a new cache routing system. She only needs to be smart enough to run a bash command, read the output for `FAILED`, and physically move a text file back a directory using `git mv`.

By enforcing this hierarchy through distinct markdown files in the `templates/` folder, our Swarm operates at near-zero operating costs for 80% of the workflow, only tapping the expensive "brains" for initial blueprints and final reviews.