Sub-Agents and the Parent's Clean Context
Merlin now spawns sub-agents inside the agent loop. The honest pitch: not a cost-saver. A quality multiplier — each child has its full attention on one thing — plus predictability when input sizes are unknown.
The one-shot arcade: making models perform in public
Nine classic games, written in a single prompt by an LLM, playable live on the public site. Pyodide runs the Python ones; sandboxed iframes run the HTML ones. A new tier of bench checks runs the programs and asserts the output. Models that scored 100% on the static checks turned out to ship chess boards that are upside down. The kind of failure you can only catch by playing.
Watching Merlin work: per-tool telemetry and the plugin-first push
A per-tool summary on every run. Tool-usage chips on the benchmarks page. Typed cargo / files / git / node plugins replacing common shell-exec calls. Granular approval flags. And the last three plugins of the cycle were written by Merlin itself.
Discord, Run-Anywhere CLI, and a Better Place to Start
Discord bridge, run-anywhere CLI, NDJSON streaming, a new specsync-create plugin. A tour of what landed in Merlin this cycle.