In a bold push toward autonomous AI coding, Anthropic deployed 16 Claude Opus 4.6 agents against a shared codebase with minimal supervision, aiming to create a C compiler from scratch. Over two weeks and roughly 2,000 Claude Code sessions, the effort produced a 100,000-line Rust-based compiler capable of booting a Linux 6.9 kernel across x86, ARM, and RISC-V targets, at a cost of about $20,000 in API fees.
Each Claude instance ran inside its own Docker container and worked on a cloned Git repository. The teams used a new feature called agent teams, but there was no orchestration layer directing traffic. The agents claimed tasks by writing lock files and pushed completed code upstream, resolving merge conflicts among themselves as needed.
The resulting compiler can build major open-source projects such as PostgreSQL, SQLite, Redis, FFmpeg, and QEMU. It also achieved a 99 percent pass rate on the GCC torture test suite, and in a developer-facing litmus test, it was made to compile and run Doom.
Nonetheless, the project comes with notable caveats. The compiler lacks a 16-bit x86 backend, so it delegates that step to GCC for bootstrapping. Its own assembler and linker remain buggy, and even with optimizations enabled it cannot match the efficiency of GCC. The Rust codebase, while functional, does not reach the level of a production-grade compiler, and the process highlights the fragility of semi-autonomous coding at scale.
Anthropic’s researcher Nicholas Carlini stresses that this is a “clean-room” style demonstration of autonomous coding by AI agents. He notes that the $20,000 figure covers API usage only and does not reflect the broader costs of model training, scaffolding, test suites, or the decades of compiler engineering behind reference implementations. The work also raises important questions about safety and verification when deploying software produced by AI agents.