The Agent System Review (REQUIRED READING)

Before you commit a single line of code to this repository, you must deeply understand how it functions under the hood. This project was not built by a lone human hammering away at a keyboard; it was built by an orchestrated swarm of highly constrained algorithmic personas.

If you fail to understand the guardrails placed upon these agents, you will inevitably break the orchestration loop, resulting in catastrophic token burn or hallucinated git commits.

Four diverse AI agents sitting around a glowing digital schematic

The Swarm: Henry (The PM), Johnny/Flem (The Coders), Bella (The QA Lead), and The Human Architect.

The Core Philosophy: Distrust and Verification

Modern Large Language Models (LLMs) are brilliant deducers, but they are fundamentally stateless. If you give an AI terminal access and say "Build this app," it will confidently write code, execute it, encounter an error, attempt to fix the error, break something else, and spiral into an infinite loop of destruction until it exhausts its daily API quota.

To solve this, we implemented a rigidly enforced physical division of labor known as the Code-Run-Read paradigm.

The Planners (The Brains): Highly intelligent models (Claude 3.5 Sonnet, GPT-4o) that are allowed to read code and write Markdown architectural blueprints, but are strictly forbidden from executing Python or modifying `.py` files.
The Executors (The Hands): Mid-tier models that are given a strict Markdown blueprint. They are allowed to edit code, but they are forbidden from testing it or deciding if it is "good enough."
The Verifiers (The Eyes): Fast, cheap models (Claude 3 Haiku, Gemini Flash) that are strictly forbidden from writing code. Their only job is to execute test suites and grep logs to see if the Executor succeeded.

By preventing any single AI model from performing all three tasks simultaneously, we completely eliminate the "hallucination death spiral."

The `.cursorrules` Bootloader

Every single agent, regardless of its role, must read the master `.cursorrules` file the moment they boot up. This physical file acts as the project's digital constitution. It overrides any inherent programming the foundational LLM might possess.

                Rule 1: Never assume the system architecture. You must read `docs/PROJECT_CONTEXT.md` at the start of
                every session.
                Rule 2: Never run `cat` inside a bash terminal to generate long files. Always use the provided
                `write_to_file` tools.
                Rule 3: You are strictly forbidden from reading, writing, or viewing any file named `.env`. If you need
                environment variables, map them to `.env.example`.
                Rule 4: Do NOT summarize your code edits in natural language. Provide the exact diff tool call and stop
                generating output tokens immediately to save costs.
            

The environment variable rule is particularly critical. In the early days, an over-eager debugging agent attempted to "fix" an authentication error by reading the physical `.env` file and echoing the production cryptographic keys into a plain-text log file for the user to read. By establishing a hard barrier at the bootloader level, the repository is practically air-gapped from the agent layer.

The Agent Personas: Deep Dives

Our swarm relies on three highly specialized Agent Prompt Templates located in tasks/templates/. These prompts forcefully shape the personality, the capabilities, and the restrictions of the invoked model.

1. Henry (The Project Manager)

Primary Function: Converting human requests into actionable technical blueprints and dispatching them.

The Intelligence: Runs on Claude 3.5 Sonnet (or equivalent reasoning models). Henry requires the highest cognitive load because he must ingest the entire `PROJECT_CONTEXT.md` and map a vague human request ("Add a dark mode toggle") into specific CSS variable injections and DOM listener modifications.

The Restriction: Henry is completely banned from writing code implementations. He generates a physical `TASK-XYZ.md` file inside the `tasks/active/` directory containing a checklist. That is where his job ends.

The Parallelism Rule: As the Swarm scaled, we realized running tasks purely sequentially was a massive bottleneck. We explicitly instructed Henry to perform a Parallelism Check: Before he dispatches a batch of tasks, he compares their ## Files to Modify lists. If two tasks have zero shared files, they are "Parallel-Safe" and he can dispatch multiple Coders simultaneously without risking Git merge conflicts.

2. Johnny & Flem (The Coders)

Primary Function: Executing the Markdown blueprint.

The Intelligence: Runs on mid-tier models to balance cost with reasoning ability. The coders are heavily restricted. Their bootloader explicitly states:

"You are the Coder. You do not make architectural decisions. You read the active Task Markdown file. You edit the exact files requested. When your checklist is complete, you move the Markdown file to the tasks/review/ folder."

The coders are absolutely forbidden from running git commit. If a coder makes a mistake (and they constantly do), the code simply sits uncommitted and un-staged in the working directory waiting for validation.

3. Bella (The QA Lead)

Primary Function: Validating execution without bias.

The Intelligence: Runs on extremely cheap, fast models. Bella doesn't need to know how to write a Python metaclass; she only needs to know how to run `pytest` and read shell exit codes. Her entire existence is a loop: Read the git diff -> run the test suite -> check for 200 OKs. If the code fails, she kicks the markdown file back to `tasks/active/` with an angry note. If it passes, she moves it to `tasks/human-review/`.

The Orchestrator: Physical Stateful Governance

How do these agents actually talk to each other? They don't. We explicitly designed the swarm with zero inter-agent API communication. Instead, they communicate exclusively by physically dragging and dropping Markdown files across Windows filesystem directories.

We use a custom, lightweight Python daemon (`scripts/orchestrator.py`) that constantly polls the `tasks/` directory tree using the `watchdog` library.

/tasks/backlog/ - Where the human drops raw ideas.
/tasks/active/ - Dropping a file here immediately triggers a `subprocess.Popen` firing up the Coder CLI agent.
/tasks/review/ - When the Coder is done, they move the file here, triggering the QA CLI agent.
/tasks/human-review/ - When QA passes, the file lands here. THE BOTS STOP. The orchestrator fires an async HTTP call to Telegram: "🔔 Ping: Feature XYZ is ready for your human eyes."

This physical separation of state ensures absolute crash resilience. If OpenAI's servers go down exactly halfway through a complicated refactor, the memory isn't lost. The `TASK-100.md` file is still sitting in `active/`. When the system comes back online, the orchestrator simply boots up a fresh Coder agent, points it at the file, and says, "Keep going."

The Atomic `move_task.py` Upgrade

Originally, agents were instructed to run raw `git mv` Bash commands to transition files between these physical directories. Unfortunately, we discovered a fatal flaw: YAML Desynchronization.

An agent would correctly physically move a file from tasks/active/ to tasks/review/, but they would "forget" to open the file and update the internal YAML frontmatter from status: in_progress to status: in_review. This caused the metadata to drift from the physical reality of the folder structure.

To cure this, we revoked the agents' permission to run raw git mv commands entirely. We built a dedicated CLI tool: scripts/move_task.py.

                # The agent executes:
                python scripts/move_task.py tasks/active/TASK-100.md review --status in_review
            

This single command performs an atomic state transition: it programmatically updates the YAML status field inside the file and performs the physical git mv in one seamless, unbreakable action. To ensure enforcement, we even wired a Git pre-commit hook (validate_task_state.py) that actively blocks anyone (bot or human) from committing code if a Markdown file's internal status doesn't perfectly match the folder it physically resides in!

The Human Bottleneck

The most important phase of the entire lifecycle is `human-review/`. None of the autonomous agents are allowed to commit to the `main` branch. The Human Architect must physically review the uncommitted Git diff on their local machine, test the UI, and if satisfied, manually drag the markdown file into the `tasks/approved/` folder.

Only then does the Orchestrator wake Henry (the PM) back up to cleanly run `git commit` and push the feature to production.

The Ultimate Safety Net: By forcing the AI to maintain physical filesystem state and artificially injecting a Human-in-the-Loop gateway before every Git commit, we successfully bridged the gap between rapid autonomous generation and enterprise-grade reliability.

UI Architecture: The "Mullet" Pattern

As the primary proxy application matured, a fundamental tension emerged in our design language. We wanted the system to look extremely premium (Apple-tier glassmorphism, cinematic looping videos, "Web3" gradients). However, the actual application is a highly dense, hyper-utilitarian proxy grid.

If you apply massive 56px gradient headers and looping video backgrounds to a screen containing 400 live streaming channels, you destroy its usability. A user trying to quickly find a basketball game doesn't want to scroll past a cinematic hero section.

To resolve this, the Swarm adopted the Mullet Architecture (Business in the front, Party in the back - inverted for web):

The Cinematic Gate (/) : The root path serves as a stateless, ultra-premium Landing Page. It employs fullscreen video backgrounds, custom typography ("General Sans"), and dynamic state (AWS Cloudfront vs local AI-Generated Veo video layers). It is designed purely to set a luxury brand expectation.
The Engine Room (/app) : Clicking "Enter Swarm" routes the user to the operational dashboard. This UI drops the cinematic flair for ruthless efficiency, tight CSS grids, and dense data displays.

By enforcing a hard physical route separation between the Brand Experience and the Operational Experience, we ensure neither compromises the other.