Why Merlin

How it works.

Merlin reads your specs, picks a provider, runs fledge plugins for tools, and verifies the output. Here's what that looks like.

Spec-Driven Development

Specs go in, correct code comes out.

Merlin reads your module specs before writing a single line of code. Invariants, public API, and error cases become hard constraints in the system prompt.

spec-aware planning
> cat specs/api/auth.md
# Authentication Spec
- JWT tokens, 24h expiry
- Refresh token rotation
- bcrypt password hashing (cost=12)
> merlin "Implement auth per spec"
Implementation matches all spec requirements
Multi-Provider Support

31 providers, one interface.

Anthropic, OpenAI (12 SKUs including gpt-5/gpt-5.5/o1/o3/o4-mini/4o), OpenRouter (×5 vendors with one key), Groq, Together, and 11 Ollama Cloud models (Qwen3-coder, Kimi K2.5, GLM-4.7, MiniMax M2.5, GPT-OSS, DeepSeek v4, Devstral, Gemma4, plus more). Swap providers with a flag.

provider switching
# Switch providers with one config change
> merlin --provider claude "Refactor auth"
Using claude-sonnet-4-6 via Anthropic
> merlin --provider openai "Refactor auth"
Using gpt-4.1-mini via OpenAI
> merlin --provider ollama "Refactor auth"
Using qwen3-coder:480b via Ollama Cloud
Plugin Architecture

Every tool is a plugin you can swap.

Bundled plugins cover filesystem, code search, shell, git, spec-sync, snapshots, runtime checks (rust, ts, python, js, sql), media (vision, voice), in-loop sub-agent delegation, and the Discord + Telegram bridges. Write your own in any language — it's just a binary that speaks JSON-lines.

fledge.toml
[plugins]
files = "plugins/fledge-plugin-files"
search = "plugins/fledge-plugin-search"
shell = "plugins/fledge-plugin-shell"
git = "plugins/fledge-plugin-git"
specsync = "plugins/fledge-plugin-specsync"
vision = "plugins/fledge-plugin-vision"
voice = "plugins/fledge-plugin-voice"
discord-bridge = "plugins/fledge-plugin-discord-bridge"
# Add your own. It's just a binary
Sub-Agents

Delegate work without filling the parent's context.

`subagent-spawn` lets a running agent hand off a self-contained subtask to a child Merlin process. The child runs its own full loop — tool calls, refusals, verification — and returns a compact JSON envelope (summary, files_changed, tool_calls, tokens). The parent's working memory stays small no matter how many items it fans out across. Default tier is `tool` (research surface; no shell-exec, no destructive writes), recursion is capped at depth 2, and the configured default provider keeps chained delegation off the parent's frontier-API account.

subagent-spawn
# Sub-agents keep the parent's context small
> subagent-spawn { label: "summarize-files", prompt: "..." }
⚙ subagent-spawn [7.0s] ✓
{ ok: true, depth: 1, tier: "tool",
summary: "Offers file system operations...",
tool_calls: 1, input_tokens: 8312, output_tokens: 42 }
# Parent saw ~250 tokens, not the whole file
Media Plugins

Agents that can see and hear.

The vision plugin sends images to a local Ollama model and returns text descriptions. The voice plugin transcribes audio with Whisper and synthesizes replies with OpenAI tts-1. The same agent loop, with new senses — and the bridges (Discord, Telegram) automatically save attachments where these plugins can find them.

vision + voice
# Agents that can see and hear
> vision-describe "/tmp/screenshot.png"
A web app login form with email + password fields,
a "Forgot password?" link, and a blue Sign In button.
> voice-transcribe "/tmp/voice-note.ogg"
Can you check the auth flow and make sure refresh
tokens rotate after each use?
# Same agent loop, new senses
Discord Bridge

Run Merlin from Discord.

A first-class bridge so your team can @mention Merlin or run slash commands from any Discord channel. Reply chains become threaded sessions, live progress shows the active tool, and each channel keeps its own session context.

bridges/discord
# Discord bridge — talk to Merlin from your server
> @Merlin refactor the auth middleware
Merlin (openrouter | claude-sonnet-4-6)
*Thinking...* ⏱ 12s 🔧 `read_file` 4,210in / 821out
/session new · /plugins · /status
Telegram Bridge

…and from Telegram.

Second user-facing channel, same architecture. Long-polls the Telegram Bot API, spawns merlin for each task, keeps per-chat session continuity. Image and voice attachments route through the media plugins automatically. Slash commands for /session new|end|status.

bridges/telegram
# Telegram bridge — second user-facing surface
> You: refactor the parser to use Result
Merlin (ollama | qwen3-coder:480b)
_Thinking..._ 12s Tool: files-edit 3,210in / 421out
/start /help /session new|end|status
Fledge Protocol

Open protocol. You can read every message.

Merlin is built on fledge-v1, a JSON-lines protocol for agent-tool communication. Every tool call, every response, fully inspectable. Stream the same NDJSON over stdout with `--output ndjson` for scripting.

protocol trace
# fledge-v1: JSON-lines protocol
{"type":"tool_call","name":"read","args":{"path":"src/main.rs"}}
{"type":"tool_result","content":"fn main() { ... }"}
{"type":"text","content":"I see the entry point..."}
Transparency

We publish our benchmarks.

27 test suites, 169 tests, including tool-augmented modes. Updated with every release.

View Benchmarks

Try it out.

Clone, build, run. Pick a provider and point it at your specs.