In the beginning, there was only the Chat Window. And the Chat Window was chaos.
When we first attempted to build this complex IPTV infrastructure using Generative AI, we fell into the exact same trap as every other development team in 2024: we treated the AI like an omniscient senior engineer sitting next to us at a desk.
We would open a single, massive conversation thread and say, "Hey, write a Python server that proxies HLS video streams, manages AES encryption keys, and decrypts payloads on the fly." The AI would happily oblige, spitting out 400 lines of brilliant code. Then we would say, "Actually, add a circuit breaker. And also, make sure it caches the M3U8 files in memory."
For the first few hours, it felt like absolute magic. The AI was a 10x developer. But as the system grew into multiple files—routing, authentication, background polling, and configuration state—the magic quickly dissolved into a nightmare of token exhaustion and catastrophic hallucinations.
Because the AI only "remembers" what is currently held inside its active context window, long-running chat sessions eventually hit a critical breaking point. The agent would confidently rewrite `server.py` to use a global dictionary for caching, completely "forgetting" that three hours earlier it had specifically implemented a thread-safe `RLock` mechanism in `auth.py`. The resulting code would look perfect, but silently drop connections in production.
Our breaking point arrived when a single vague prompt—"Fix the proxy timeout bug"—caused an un-scoped Coder Agent to aggressively grep through the entire repository, hallucinate that the `.env` file was misconfigured, and attempt to rewrite our private cryptographic secrets.
We realized that modern Large Language Models (LLMs) are incredibly powerful at isolated deduction, but they are utterly terrible at long-term architectural object permanence. They are stateless. If you try to hold the state of a massive software project entirely inside the raw context history of an API call, the architecture will inevitably collapse under its own weight.
To solve this, we had to completely change our mental model of what an AI Agent actually is. We stopped treating the agent like a human coworker, and started treating it like an **untrusted, amnesiac contractor**.
We designed a system where the AI retains absolutely zero long-term memory in its code session. Instead, **the physical filesystem of the project repository became the single source of truth.**
Enter the `tasks/` directory state machine.
Rather than holding a conversation with an agent about what needed to be built, we forced the agent to write a physical Markdown file (`TASK-XYZ.md`) detailing the exact sub-components, requirements, and testing criteria of the feature. This physical document—not the chat history—became the authoritative blueprint.
At this stage, we investigated using popular Multi-Agent Frameworks like CrewAI or Microsoft AutoGen to manage our swarm. However, we quickly discovered they were brittle and dangerously abstract. They tried to execute Python code in simulated environments, or string agent responses together via API chained JSON objects. If a script crashed midway through a 40-minute refactor, the entire JSON state was lost, and the system would have to start from scratch.
We threw out the frameworks and built a bare-metal Python watchdog: `orchestrator.py`.
The orchestrator is incredibly simple: it doesn't know what the agents are thinking, and it doesn't care about their API JSON payloads. It purely watches physical folders on the Windows filesystem.
When `TASK-XYZ.md` physically moves into the `active/` folder, the Orchestrator boots up Johnny (The Coder) in a completely fresh, isolated subprocess command line, hands him the file path, and says "Build this."
Because the state is physically serialized in the folder structure, the system is fully crash-resilient. If the power goes out, or the API rate-limits the Coder halfway through a file, the Orchestrator simply restarts. It sees `TASK-XYZ.md` is still sitting in `tasks/active/`, and it boots up a brand new Coder agent to pick up exactly where the last one died.
By enforcing this rigid physical state machine, we achieved what we call the **Code-Run-Read** paradigm. The agents were no longer allowed to plan, execute, and verify their own code natively.
The transformation was staggering. But we were still relying on agents to execute raw git mv
Bash commands and remember to manually edit YAML frontmatter. This led to occasional drift where a file was
physically moved but internally said status: in_progress.
So, we took it one step further. We revoked their Bash git mv privileges and built
scripts/move_task.py. We gave the Swarm a dedicated, atomic lifecycle tool. Now, a single
Python command simultaneously updates the YAML status and physically transports the file to the next
directory. We even wired a Git pre-commit hook that physically blocks humans from pushing code
if the swarm's YAML state is out of sync.
We had finally tamed the chaos. We turned a destructive, over-eager chatbot into a highly regulated, autonomous software engineering factory.