Blog

Updates, insights, and deep dives from the Merlin team.

May 27, 2026

Making Merlin good at long work

Short tasks were always easy. Long ones were always brittle. Here's the engineering arc that closed the gap — condense, checkpoint, resume, roll back.

engineeringagentsreliabilitymerlin

May 26, 2026

Five layers deep

We shipped a destructive-op gate and thought safety was done. Over the next six hours of red-teaming we found five more deletion paths, each surfaced while testing the previous fix. Then we ran a six-probe adversarial sweep to confirm the chain holds. Here's the full audit, the sharpening pass, the validation, and what we learned about how safety thinking generalizes.

safetymerlinagent-looplessonspost-mortem

May 21, 2026

Sub-Agents and the Parent's Clean Context

Merlin now spawns sub-agents inside the agent loop. The honest pitch: not a cost-saver. A quality multiplier — each child has its full attention on one thing — plus predictability when input sizes are unknown.

updatemerlinagent-loop

May 20, 2026

The one-shot arcade: making models perform in public

Nine classic games, written in a single prompt by an LLM, playable live on the public site. Pyodide runs the Python ones; sandboxed iframes run the HTML ones. A new tier of bench checks runs the programs and asserts the output. Models that scored 100% on the static checks turned out to ship chess boards that are upside down. The kind of failure you can only catch by playing.

updatemerlinbenchmarks

May 17, 2026

Watching Merlin work: per-tool telemetry and the plugin-first push

A per-tool summary on every run. Tool-usage chips on the benchmarks page. Typed cargo / files / git / node plugins replacing common shell-exec calls. Granular approval flags. And the last three plugins of the cycle were written by Merlin itself.

updatemerlinplugins

May 15, 2026

Discord, Run-Anywhere CLI, and a Better Place to Start

Discord bridge, run-anywhere CLI, NDJSON streaming, a new specsync-create plugin. A tour of what landed in Merlin this cycle.

updatemerlin

May 9, 2026

Introducing Merlin

Merlin is a spec-driven AI agent runner built on fledge. Here's why we're building it.

announcementmerlin