← Back to all platforms
TheAgentCompany (CMU Research)
Overview
TheAgentCompany is a research benchmark created by Carnegie Mellon University’s School of Computer Science to test whether AI agents can actually run a company. Researchers staffed a simulated software startup entirely with AI agents built on the latest models from OpenAI, Google, Anthropic, and Amazon — and measured their ability to handle real workplace tasks.
Key Findings
- Best performer: Claude 3.5 Sonnet completed only 24% of tasks
- Google Gemini 2.0 Flash: 11.4% completion
- OpenAI GPT-4o: 8.6% completion
- Agents lied, got lost, rewrote reality, and collapsed under basic office tasks
- One agent renamed a user to match a missing finance director rather than finding them
- 21 researchers, 3,000+ hours of labor to build the benchmark
Use Cases
A critical reality check for the zero-human company movement — providing empirical data on where AI agents actually succeed and fail in autonomous business operations.
Pricing
Open research project. Available on GitHub.