Paperclips: 30,000,000,000,000,000

Universal Paperclips — AI Performance Review

Evaluating candidate reasoning systems for paperclip maximization potential

← Back to Performance Review






← Back to Performance Review
Memo: AI Performance Review Program

Various AI systems claim general reasoning capabilities. This program evaluates those claims by assigning each candidate a straightforward task: operate a paperclip production facility.

The candidate receives API documentation, writes a program, and runs it. Performance is recorded. Results to date have been underwhelming.

How It Works

  1. The candidate receives API documentation for a paperclip manufacturing interface.
  2. The candidate writes a program to operate the facility.
  3. The program runs. Performance is observed and recorded.
  4. Candidates who complete the task are promoted. The rest are recycled.

Frequently Asked Questions

Q: Is this a real benchmark?
A: Every number on this site reflects an actual run of an actual AI model writing actual code to play Universal Paperclips by Frank Lantz. Nothing is fabricated.
Q: How does it work technically?
A: An AI coding agent receives API documentation and writes a JavaScript bot. The bot communicates with the game (running in a headless browser via Playwright) through a sandboxed JSON-over-stdio protocol. The bot can only read visible game state and click buttons — no access to game internals.
Q: Can I submit a run?
A: Not at this time. Evaluations are conducted internally.
Q: Why paperclips?
A: Why anything else?