Spec-aware Planning

Merlin’s headline feature is not “an LLM that calls tools” — those are common. The differentiator is that specs are loaded as constraints before planning, and verified after execution.

The Flow

At the start of every task:

  1. List specs. Merlin invokes the specsync-list plugin command, which returns every module spec registered with spec-sync.
  2. Score relevance. Each spec name is tokenized (lowercased, non-alphanumeric split, short/stop words dropped) and scored by token overlap with the task description.
  3. Read the top three. For each top match, Merlin invokes specsync-read <name> to fetch the full spec content.
  4. Extract constraint sections. From each spec, Merlin extracts four sections:
    • ## Purpose
    • ## Invariants
    • ## Public API
    • ## Error Cases If a spec has none of those, the full body is used.
  5. Inject as constraints. The extracted text is added as a spec constraint via Context::add_spec_constraint, which appends it to the system prompt under an “Active Spec Constraints” header.
  6. Tell the LLM. The system prompt explicitly states “You MUST follow these specs. Violations will cause verification failure.”

After execution, the verify lane (which includes fledge spec check) confirms the changes don’t break the contract — and any failure is fed back to the LLM for another iteration, bounded by max_retries.

Example

Task: "add error handling to the parser"

Spec registry:

  • agent
  • provider
  • parser
  • fledge-protocol

Token overlap scoring:

  • parser → 1 (matches “parser”)
  • agent, provider, fledge-protocol → 0

Top picks: [parser]. Merlin reads specs/parser/parser.spec.md, extracts the four sections, and prepends them to the system prompt:

## Active Spec Constraints

You MUST follow these specs. Violations will cause verification failure.

---
# Spec: parser

## Purpose
The parser module turns raw input into structured AST nodes...

## Invariants
1. UTF-8 input only.
2. Position info preserved for every node.
...
---

Implementation

The relevance matcher is in crates/merlin-core/src/spec_loader.rs. It exposes two pure functions, both unit-tested:

  • select_relevant_specs(task: &str, all_specs: &[String], top_n: usize) -> Vec<SpecRef>
  • extract_constraint_sections(spec_name: &str, full_content: &str) -> String

The agent loop calls them in Agent::load_relevant_specs, which is invoked from Agent::run_task between memory recall and the first LLM turn.

Tuning

  • The top_n parameter is currently hardcoded to 3 in agent.rs. If the task touches many modules, you may want more; tasks at module boundaries usually hit cleanly with 3.
  • The stop-word list (in spec_loader.rs) is small and intentionally generic. If your project’s modules use names that collide with English stop words, consider extending it.
  • Keyword overlap is unweighted — every match is worth 1 point. For larger spec corpora, a TF-IDF or embedding-based scorer would be more accurate. A RelevanceScorer trait would be the natural extension point.

Spec-write-first (Future)

The design spec calls for an additional flow: when the LLM is creating a new module, it should write the spec first, register it via specsync-register, and then implement. This is not yet wired up. See the project plan for status.