How it works.
Merlin reads your specs, picks a provider, runs fledge plugins for tools, and verifies the output. Here's what that looks like.
Specs go in, correct code comes out.
Merlin reads your module specs before writing a single line of code. Invariants, public API, and error cases become hard constraints in the system prompt.
31 providers, one interface.
Anthropic, OpenAI (12 SKUs including gpt-5/gpt-5.5/o1/o3/o4-mini/4o), OpenRouter (×5 vendors with one key), Groq, Together, and 11 Ollama Cloud models (Qwen3-coder, Kimi K2.5, GLM-4.7, MiniMax M2.5, GPT-OSS, DeepSeek v4, Devstral, Gemma4, plus more). Swap providers with a flag.
Every tool is a plugin you can swap.
Bundled plugins cover filesystem, code search, shell, git, spec-sync, snapshots, runtime checks (rust, ts, python, js, sql), media (vision, voice), in-loop sub-agent delegation, and the Discord + Telegram bridges. Write your own in any language — it's just a binary that speaks JSON-lines.
Delegate work without filling the parent's context.
`subagent-spawn` lets a running agent hand off a self-contained subtask to a child Merlin process. The child runs its own full loop — tool calls, refusals, verification — and returns a compact JSON envelope (summary, files_changed, tool_calls, tokens). The parent's working memory stays small no matter how many items it fans out across. Default tier is `tool` (research surface; no shell-exec, no destructive writes), recursion is capped at depth 2, and the configured default provider keeps chained delegation off the parent's frontier-API account.
Agents that can see and hear.
The vision plugin sends images to a local Ollama model and returns text descriptions. The voice plugin transcribes audio with Whisper and synthesizes replies with OpenAI tts-1. The same agent loop, with new senses — and the bridges (Discord, Telegram) automatically save attachments where these plugins can find them.
Run Merlin from Discord.
A first-class bridge so your team can @mention Merlin or run slash commands from any Discord channel. Reply chains become threaded sessions, live progress shows the active tool, and each channel keeps its own session context.
…and from Telegram.
Second user-facing channel, same architecture. Long-polls the Telegram Bot API, spawns merlin for each task, keeps per-chat session continuity. Image and voice attachments route through the media plugins automatically. Slash commands for /session new|end|status.
Open protocol. You can read every message.
Merlin is built on fledge-v1, a JSON-lines protocol for agent-tool communication. Every tool call, every response, fully inspectable. Stream the same NDJSON over stdout with `--output ndjson` for scripting.