0001 — Windows: bundled Ollama vs require-install¶

Status: Decision recorded · 2026-05-27 Closes: #142 Parent tracker: #93 (roadmap), #106 (Windows port)

Context¶

Quill's polish pass is the difference between "raw STT output" and "the sentence you wanted to dictate". On the two platforms we already ship, the polish pipeline is zero-setup: the .deb / AppImage drops the Ollama binary at /usr/libexec/quill/bin/ollama plus the cpu_avx / cpu_avx2 runners under lib/ollama/runners/, the macOS .app carries ollama inside Contents/Resources/, and quill-polish::backend::find_bundled_ollama() resolves them relative to current_exe() at runtime (see crates/quill-polish/src/backend.rs:332-396). The user installs Quill, hits the hotkey, and the daemon spawns its own Ollama subprocess at a private host:port — never touching whatever the user might have installed system-wide.

The Windows port (#93, #106) needs a story for the same pipeline. The options below were investigated against Ollama v0.24.0 release assets and the upstream Windows docs, which materially settle the redistribution and packaging questions.

Facts established by the investigation¶

Question	Answer	Source
Is Ollama MIT-licensed?	Yes — MIT permits binary redistribution provided the copyright + license text travels with the bundle.	ollama/ollama LICENSE
Is there a standalone Windows zip alongside the installer?	Yes — `ollama-windows-amd64.zip` (~2.07 GB CPU+CUDA), plus optional `ollama-windows-amd64-rocm.zip` (AMD) and `ollama-windows-amd64-mlx.zip` (NVIDIA MLX).	v0.24.0 release assets
Does the upstream project endorse embedding the zip in another app?	Yes — verbatim from `docs/windows.mdx`: "This allows for embedding Ollama in existing applications, or running it as a system service via `ollama serve`."	Upstream Windows docs
Does the installer require admin?	No — installs in the user's home dir; binaries land at `%LOCALAPPDATA%\Programs\Ollama`, logs at `%LOCALAPPDATA%\Ollama`, models at `%HOMEPATH%\.ollama\models`.	Upstream Windows docs
Does the installer register a Windows service?	No — it starts a tray app per user session. A service is opt-in via `nssm`.	Upstream Windows docs
Are `OLLAMA_HOST` / `OLLAMA_MODELS` honored the same way as Linux/macOS?	Yes — the same envs we already set in `spawn_bundled()` work unchanged.	Upstream FAQ
Does the CPU-only zip exist as a smaller variant?	No — the base `ollama-windows-amd64.zip` already bundles CUDA libraries. ROCm/MLX are additional zips you overlay into the same directory. There is no published CPU-only slim build.	v0.24.0 release assets

The third row is the load-bearing one: redistribution isn't a gray area. Upstream explicitly invites the embedding path we're considering.

Options¶

A — Bundle the Windows binary into the `.msi` / `.exe` (mirror Linux/macOS)¶

Extract ollama-windows-amd64.zip into the Quill install tree at Quill\bin\ollama\ (or similar), spawn it from the daemon the same way spawn_bundled() does today, point OLLAMA_HOST at a private loopback port, point OLLAMA_MODELS at a Quill-managed directory.

Pros - Zero-setup parity with macOS and Linux. The Windows experience matches the shipping platforms. - find_bundled_ollama() already has the ollama.exe branch (crates/quill-polish/src/backend.rs:391) — discovery is solved. - No coexistence problems: a private OLLAMA_HOST keeps us isolated from a user-installed Ollama tray app already bound on :11434. - Upstream explicitly sanctions the embedding pattern, so a future license review doesn't reopen the question.

Cons / blockers - Installer size. The base zip is ~2.07 GB (the bundled CUDA libs are the bulk). Even after deleting the CUDA blobs to mirror the Linux strategy (Linux today ships only cpu_avx + cpu_avx2 and hard-skips CUDA / ROCm) the residue is still on the order of hundreds of MB. The .msi will dwarf the Linux .deb (~30 MB) and the macOS .dmg. - Slim Windows variant doesn't exist. Linux gives us a tarball layout we can tar -xzf … ./bin/ollama ./lib/ollama/runners/cpu_avx* and skip the GPU runners cleanly. The Windows zip's GPU DLLs live alongside ollama.exe and ollama runner.exe; we'd need to script a curated extraction (likely: keep ollama.exe, the inference runner, and ggml*.dll; drop CUDA/cuDNN DLLs) and re-validate the curation every Ollama release. That's a recurring tax on packaging. - Auto-update conflict. Ollama's tray app auto-updates itself; our bundled copy doesn't. If the user runs the tray app for unrelated reasons they'll have a newer Ollama on PATH than the one Quill spawns. Not fatal — we always invoke our own binary by absolute path — but it's a support-channel question we'll absorb ("which Ollama is Quill using?"). - Code signing. The standalone ollama.exe is signed by the Ollama team. We re-sign as part of our .msi build; whether re-signing invalidates upstream's signature for SmartScreen reputation purposes needs a one-shot test. Low risk, but unknown.

B — Require-install (point users at ollama.com)¶

Detect a system Ollama on 127.0.0.1:11434 (health_check already exists in crates/quill-polish/src/backend.rs:399). If absent, surface an in-app dialog with a one-click "Download Ollama" button that opens https://ollama.com/download/windows. Raw transcription still works; polish pass shows "disabled — install Ollama" until detection succeeds.

Pros - Tiny .msi (the Linux baseline is ~30 MB; Windows would land similarly). - No license / redistribution / re-signing surface at all. - Users who already have Ollama (a non-trivial slice on dev-leaning Windows laptops) just work, day one. - Aligns with how every other Ollama-consuming Windows app (Open WebUI desktop, AnythingLLM, Msty, etc.) handles it — there's no precedent we'd be breaking by not bundling.

Cons / blockers - Onboarding regression vs macOS and Linux. The "what is Ollama" gulp is real for mid-tech-literate users — the alpha cohort skews technical, but the beta cohort (#93 milestone label) is broader. - Two-step setup ("install Quill, then install Ollama") inflates the bounce rate during the install funnel. We have no data on this yet, but the macOS / Linux flows were explicitly designed around one-step setup. - The detection-fallback path is more UI work than the bundled path: we need a dedicated "polish disabled" state, a download CTA, and a re-probe trigger when the user comes back after installing.

C — Embedded `llama.cpp` instead (skip Ollama entirely on Windows)¶

Build the quill-polish crate's existing embedded-llama feature into the Windows release. Ship a GGUF model alongside the .msi (or download on first run, same flow as the Whisper STT model). No Ollama subprocess, no HTTP server, no localhost port-binding.

The scaffolding already exists — crates/quill-polish/src/embedded.rs implements the worker-thread LlamaBackend pattern behind the embedded-llama Cargo feature, gated off by default because the llama-cpp-2 transitive cmake build is heavy (~2 min cold) and most users get Ollama-fast-enough.

Pros - Smaller install footprint than option A — no separate Ollama tree, no HTTP server, just the llama-cpp-2 static lib + the GGUF. - Cleanest privacy story: no localhost server, no port-binding, no third subprocess. The embedded.rs module header lists this as the explicit motivation. - Cross-platform consistency once we flip the feature on everywhere — no more "Linux uses cpu_avx2 runner, macOS uses Metal, Windows uses CUDA DLLs" matrix. - No upstream auto-update conflict (because no upstream binary).

Cons / blockers - Output quality gate resolved for closed alpha. This ADR predates the Qwen3 chat-template path. Embedded polish now renders the GGUF's native chat template, disables thinking output, stops on ChatML/EOS markers, and falls back to raw on suspiciously long output. Windows still needs packaging proof before it can depend on embedded polish, but chat-template quality is no longer the blocker called out in the original decision. - Cancellation is unfinished. Per embedded.rs: "A pending polish() doesn't yet observe the tokio cancellation signal." - GPU acceleration on Windows isn't free. Ollama ships a curated CUDA + ROCm + Vulkan matrix; we'd own the equivalent for llama-cpp-2. - Cross-platform commitment. Flipping embedded-llama on for Windows but not for macOS / Linux means two polish backends to maintain. Flipping it on everywhere is a much larger ask than the Windows port itself.

Decision¶

Adopt Option A (bundle), with the Option B fallback path retained as a second-line defense.

Concretely: 1. The Windows .msi ships ollama.exe extracted from ollama-windows-amd64.zip plus the minimum runner / GGML DLL set needed to serve CPU inference. CUDA / ROCm / MLX variants are not bundled in the v1 cut — users with GPUs install Ollama themselves and select polish_backend=system, the same escape hatch we offer Linux NVIDIA / AMD users today (packaging/linux/package.sh:120-122). 2. find_bundled_ollama() gets a Windows-specific candidate path alongside the existing macOS / Linux ones — the ollama.exe branch at line 391 already covers the filename, but the location list (lines 338-388) is Linux-/macOS-only and needs a sibling bin\ollama.exe candidate matching wherever the .msi lands. 3. If the bundled binary is missing, polish_backend=auto fails visibly instead of silently touching a user-managed Ollama. Users who explicitly choose polish_backend=system get the system-Ollama probe on :11434 via BackendConfig (PolishBackend::System arm, crates/quill-polish/src/backend.rs:151-175). 4. If neither bundled nor system Ollama responds, the daemon surfaces a structured "polish unavailable" state and the GUI shows the Option-C-style "Install Ollama" CTA. Raw STT continues to work.

Option C (embedded-llama) stays on the roadmap as the eventual replacement for all three platforms, but the chat-template and cancellation work in embedded.rs are prerequisites, and treating Windows as the forcing function for that migration would couple two risky workstreams. Better to ship Windows on the same Ollama pipe the other platforms run today and migrate everyone together later.

Consequences¶

What changes¶

packaging/windows/package.ps1 (new — to be authored in a follow-up ticket) downloads ollama-windows-amd64.zip, extracts a curated subset, signs it alongside quill-app.exe + quill-daemon.exe, and rolls everything into the .msi via WiX or cargo-wix.
crates/quill-polish/src/backend.rs::bundled_candidates() gains a Windows arm pointing at the .msi install root.
specs/polish/polish.spec.md gets a Windows section codifying the bundled-binary-with-system-fallback contract (was tracked under the #142 "Tooling > Specs" bullet — that update lands with the implementation PR, not this decision doc).
The redistribution NOTICE in packaging/ grows an Ollama line (MIT attribution + a copy of upstream's LICENSE text inside the installed tree).

Work unlocked¶

The Windows port (#106) gets a concrete polish-path target instead of an open question. Beta milestone (beta label) can proceed without waiting on embedded-llama maturity.

Tickets to spin off¶

Implement Windows packaging lane. New ticket for the packaging/windows/package.ps1 script + signed .msi build. Block on this decision; unblocked by it.
Curate the Windows Ollama subset. Determine the minimum DLL set inside ollama-windows-amd64.zip for CPU inference, validate that removing CUDA / cuDNN DLLs doesn't break ollama serve, document the curation so it's reproducible across Ollama version bumps.
Windows polish-disabled UI. The "no Ollama" CTA in the GUI for the rare case where both the bundled and system paths fail. Mostly UI work in quill-app.
Spec the Windows polish path. Update specs/polish/polish.spec.md with a Windows section + bump version. Lands with the implementation PR.
Resign + SmartScreen reputation test. One-shot manual test: re-sign the upstream ollama.exe with our CorvidLabs cert and confirm SmartScreen doesn't downgrade the signal. If it does, widen the scope of the packaging ticket to handle it.

What this decision explicitly defers¶

The embedded-llama migration. It remains on the roadmap; the chat-template work in embedded.rs is the gating dependency, not the Windows port.
GPU acceleration on Windows. Power users opt in via polish_backend=system + a manual Ollama install, mirroring Linux today.