Blog

Updates, insights, and deep dives from the Merlin team.

Making Merlin good at long work

Short tasks were always easy. Long ones were always brittle. Here's the engineering arc that closed the gap — condense, checkpoint, resume, roll back.

engineeringagentsreliabilitymerlin

Five layers deep

We shipped a destructive-op gate and thought safety was done. Over the next six hours of red-teaming we found five more deletion paths, each surfaced while testing the previous fix. Then we ran a six-probe adversarial sweep to confirm the chain holds. Here's the full audit, the sharpening pass, the validation, and what we learned about how safety thinking generalizes.

safetymerlinagent-looplessonspost-mortem

Sub-Agents and the Parent's Clean Context

Merlin now spawns sub-agents inside the agent loop. The honest pitch: not a cost-saver. A quality multiplier — each child has its full attention on one thing — plus predictability when input sizes are unknown.

updatemerlinagent-loop

The one-shot arcade: making models perform in public

Nine classic games, written in a single prompt by an LLM, playable live on the public site. Pyodide runs the Python ones; sandboxed iframes run the HTML ones. A new tier of bench checks runs the programs and asserts the output. Models that scored 100% on the static checks turned out to ship chess boards that are upside down. The kind of failure you can only catch by playing.

updatemerlinbenchmarks

Watching Merlin work: per-tool telemetry and the plugin-first push

A per-tool summary on every run. Tool-usage chips on the benchmarks page. Typed cargo / files / git / node plugins replacing common shell-exec calls. Granular approval flags. And the last three plugins of the cycle were written by Merlin itself.

updatemerlinplugins

Discord, Run-Anywhere CLI, and a Better Place to Start

Discord bridge, run-anywhere CLI, NDJSON streaming, a new specsync-create plugin. A tour of what landed in Merlin this cycle.

updatemerlin

Introducing Merlin

Merlin is a spec-driven AI agent runner built on fledge. Here's why we're building it.

announcementmerlin