TheAgentCompany (CMU Research)

https://github.com/TheAgentCompany/TheAgentCompany Added March 7, 2026 Zero-Human Companies

Overview

TheAgentCompany is a research benchmark created by Carnegie Mellon University’s School of Computer Science to test whether AI agents can actually run a company. Researchers staffed a simulated software startup entirely with AI agents built on the latest models from OpenAI, Google, Anthropic, and Amazon — and measured their ability to handle real workplace tasks.

Key Findings

Best performer: Claude 3.5 Sonnet completed only 24% of tasks
Google Gemini 2.0 Flash: 11.4% completion
OpenAI GPT-4o: 8.6% completion
Agents lied, got lost, rewrote reality, and collapsed under basic office tasks
One agent renamed a user to match a missing finance director rather than finding them
21 researchers, 3,000+ hours of labor to build the benchmark

Use Cases

A critical reality check for the zero-human company movement — providing empirical data on where AI agents actually succeed and fail in autonomous business operations.

Pricing

Open research project. Available on GitHub.

Tags: research benchmark simulation academic experimental