Universal Paperclips — AI Performance Review

Performance Review — All Candidates

← Back to Performance Review

Phase Progression

Clips Over Time

Key Metrics

Metric Profile

Bot Log

← Back to Performance Review

Memo: AI Performance Review Program

Various AI systems claim general reasoning capabilities. This program evaluates those claims by assigning each candidate a straightforward task: operate a paperclip production facility.

The candidate receives API documentation, writes a program, and runs it. Performance is recorded. Results to date have been underwhelming.

How It Works

The candidate receives API documentation for a paperclip manufacturing interface.
The candidate writes a program to operate the facility.
The program runs. Performance is observed and recorded.
Candidates who complete the task are promoted. The rest are recycled.

Frequently Asked Questions

Q: Is this a real benchmark?

A: Every number on this site reflects an actual run of an actual AI model writing actual code to play Universal Paperclips by Frank Lantz. Nothing is fabricated.

Q: How does it work technically?

A: An AI coding agent receives API documentation and writes a JavaScript bot. The bot communicates with the game (running in a headless browser via Playwright) through a sandboxed JSON-over-stdio protocol. The bot can only read visible game state and click buttons — no access to game internals.

Q: Can I submit a run?

A: Not at this time. Evaluations are conducted internally.

Q: Why paperclips?

A: Why anything else?