← Back to all platforms

TheAgentCompany (CMU Research)

Overview

TheAgentCompany is a research benchmark created by Carnegie Mellon University’s School of Computer Science to test whether AI agents can actually run a company. Researchers staffed a simulated software startup entirely with AI agents built on the latest models from OpenAI, Google, Anthropic, and Amazon — and measured their ability to handle real workplace tasks.

Key Findings

  • Best performer: Claude 3.5 Sonnet completed only 24% of tasks
  • Google Gemini 2.0 Flash: 11.4% completion
  • OpenAI GPT-4o: 8.6% completion
  • Agents lied, got lost, rewrote reality, and collapsed under basic office tasks
  • One agent renamed a user to match a missing finance director rather than finding them
  • 21 researchers, 3,000+ hours of labor to build the benchmark

Use Cases

A critical reality check for the zero-human company movement — providing empirical data on where AI agents actually succeed and fail in autonomous business operations.

Pricing

Open research project. Available on GitHub.