Spec-aware Planning
Merlin’s headline feature is not “an LLM that calls tools” — those are common. The differentiator is that specs are loaded as constraints before planning, and verified after execution.
The Flow
At the start of every task:
- List specs. Merlin invokes the
specsync-listplugin command, which returns every module spec registered with spec-sync. - Score relevance. Each spec name is tokenized (lowercased, non-alphanumeric split, short/stop words dropped) and scored by token overlap with the task description.
- Read the top three. For each top match, Merlin invokes
specsync-read <name>to fetch the full spec content. - Extract constraint sections. From each spec, Merlin extracts
four sections:
## Purpose## Invariants## Public API## Error CasesIf a spec has none of those, the full body is used.
- Inject as constraints. The extracted text is added as a spec
constraint via
Context::add_spec_constraint, which appends it to the system prompt under an “Active Spec Constraints” header. - Tell the LLM. The system prompt explicitly states “You MUST follow these specs. Violations will cause verification failure.”
After execution, the verify lane (which includes fledge spec check)
confirms the changes don’t break the contract — and any failure is fed
back to the LLM for another iteration, bounded by max_retries.
Example
Task: "add error handling to the parser"
Spec registry:
agentproviderparserfledge-protocol
Token overlap scoring:
parser→ 1 (matches “parser”)agent,provider,fledge-protocol→ 0
Top picks: [parser]. Merlin reads specs/parser/parser.spec.md,
extracts the four sections, and prepends them to the system prompt:
## Active Spec Constraints
You MUST follow these specs. Violations will cause verification failure.
---
# Spec: parser
## Purpose
The parser module turns raw input into structured AST nodes...
## Invariants
1. UTF-8 input only.
2. Position info preserved for every node.
...
---
Implementation
The relevance matcher is in
crates/merlin-core/src/spec_loader.rs.
It exposes two pure functions, both unit-tested:
select_relevant_specs(task: &str, all_specs: &[String], top_n: usize) -> Vec<SpecRef>extract_constraint_sections(spec_name: &str, full_content: &str) -> String
The agent loop calls them in Agent::load_relevant_specs, which is
invoked from Agent::run_task between memory recall and the first LLM
turn.
Tuning
- The
top_nparameter is currently hardcoded to3inagent.rs. If the task touches many modules, you may want more; tasks at module boundaries usually hit cleanly with 3. - The stop-word list (in
spec_loader.rs) is small and intentionally generic. If your project’s modules use names that collide with English stop words, consider extending it. - Keyword overlap is unweighted — every match is worth 1 point. For
larger spec corpora, a TF-IDF or embedding-based scorer would be
more accurate. A
RelevanceScorertrait would be the natural extension point.
Spec-write-first (Future)
The design spec calls for an additional flow: when the LLM is creating
a new module, it should write the spec first, register it via
specsync-register, and then implement. This is not yet wired up. See
the project plan for status.