0001 — Windows: bundled Ollama vs require-install¶
Status: Decision recorded · 2026-05-27 Closes: #142 Parent tracker: #93 (roadmap), #106 (Windows port)
Context¶
Quill's polish pass is the difference between "raw STT output" and "the
sentence you wanted to dictate". On the two platforms we already ship,
the polish pipeline is zero-setup: the .deb / AppImage drops the
Ollama binary at /usr/libexec/quill/bin/ollama plus the
cpu_avx / cpu_avx2 runners under lib/ollama/runners/, the macOS
.app carries ollama inside Contents/Resources/, and
quill-polish::backend::find_bundled_ollama() resolves them relative
to current_exe() at runtime (see
crates/quill-polish/src/backend.rs:332-396). The user installs Quill,
hits the hotkey, and the daemon spawns its own Ollama subprocess at a
private host:port — never touching whatever the user might have
installed system-wide.
The Windows port (#93, #106) needs a story for the same pipeline. The options below were investigated against Ollama v0.24.0 release assets and the upstream Windows docs, which materially settle the redistribution and packaging questions.
Facts established by the investigation¶
| Question | Answer | Source |
|---|---|---|
| Is Ollama MIT-licensed? | Yes — MIT permits binary redistribution provided the copyright + license text travels with the bundle. | ollama/ollama LICENSE |
| Is there a standalone Windows zip alongside the installer? | Yes — ollama-windows-amd64.zip (~2.07 GB CPU+CUDA), plus optional ollama-windows-amd64-rocm.zip (AMD) and ollama-windows-amd64-mlx.zip (NVIDIA MLX). |
v0.24.0 release assets |
| Does the upstream project endorse embedding the zip in another app? | Yes — verbatim from docs/windows.mdx: "This allows for embedding Ollama in existing applications, or running it as a system service via ollama serve." |
Upstream Windows docs |
| Does the installer require admin? | No — installs in the user's home dir; binaries land at %LOCALAPPDATA%\Programs\Ollama, logs at %LOCALAPPDATA%\Ollama, models at %HOMEPATH%\.ollama\models. |
Upstream Windows docs |
| Does the installer register a Windows service? | No — it starts a tray app per user session. A service is opt-in via nssm. |
Upstream Windows docs |
Are OLLAMA_HOST / OLLAMA_MODELS honored the same way as Linux/macOS? |
Yes — the same envs we already set in spawn_bundled() work unchanged. |
Upstream FAQ |
| Does the CPU-only zip exist as a smaller variant? | No — the base ollama-windows-amd64.zip already bundles CUDA libraries. ROCm/MLX are additional zips you overlay into the same directory. There is no published CPU-only slim build. |
v0.24.0 release assets |
The third row is the load-bearing one: redistribution isn't a gray area. Upstream explicitly invites the embedding path we're considering.
Options¶
A — Bundle the Windows binary into the .msi / .exe (mirror Linux/macOS)¶
Extract ollama-windows-amd64.zip into the Quill install tree at
Quill\bin\ollama\ (or similar), spawn it from the daemon the same
way spawn_bundled() does today, point OLLAMA_HOST at a private
loopback port, point OLLAMA_MODELS at a Quill-managed directory.
Pros
- Zero-setup parity with macOS and Linux. The Windows experience matches
the shipping platforms.
- find_bundled_ollama() already has the ollama.exe branch
(crates/quill-polish/src/backend.rs:391) — discovery is solved.
- No coexistence problems: a private OLLAMA_HOST keeps us isolated
from a user-installed Ollama tray app already bound on :11434.
- Upstream explicitly sanctions the embedding pattern, so a future
license review doesn't reopen the question.
Cons / blockers
- Installer size. The base zip is ~2.07 GB (the bundled CUDA libs
are the bulk). Even after deleting the CUDA blobs to mirror the
Linux strategy (Linux today ships only cpu_avx + cpu_avx2 and
hard-skips CUDA / ROCm) the residue is still on the order of
hundreds of MB. The .msi will dwarf the Linux .deb (~30 MB)
and the macOS .dmg.
- Slim Windows variant doesn't exist. Linux gives us a tarball
layout we can tar -xzf … ./bin/ollama ./lib/ollama/runners/cpu_avx*
and skip the GPU runners cleanly. The Windows zip's GPU DLLs live
alongside ollama.exe and ollama runner.exe; we'd need to script
a curated extraction (likely: keep ollama.exe, the inference
runner, and ggml*.dll; drop CUDA/cuDNN DLLs) and re-validate the
curation every Ollama release. That's a recurring tax on packaging.
- Auto-update conflict. Ollama's tray app auto-updates itself; our
bundled copy doesn't. If the user runs the tray app for unrelated
reasons they'll have a newer Ollama on PATH than the one Quill spawns.
Not fatal — we always invoke our own binary by absolute path — but
it's a support-channel question we'll absorb ("which Ollama is Quill
using?").
- Code signing. The standalone ollama.exe is signed by the Ollama
team. We re-sign as part of our .msi build; whether re-signing
invalidates upstream's signature for SmartScreen reputation purposes
needs a one-shot test. Low risk, but unknown.
B — Require-install (point users at ollama.com)¶
Detect a system Ollama on 127.0.0.1:11434 (health_check already
exists in crates/quill-polish/src/backend.rs:399). If absent, surface
an in-app dialog with a one-click "Download Ollama" button that opens
https://ollama.com/download/windows. Raw transcription still works;
polish pass shows "disabled — install Ollama" until detection succeeds.
Pros
- Tiny .msi (the Linux baseline is ~30 MB; Windows would land
similarly).
- No license / redistribution / re-signing surface at all.
- Users who already have Ollama (a non-trivial slice on dev-leaning
Windows laptops) just work, day one.
- Aligns with how every other Ollama-consuming Windows app (Open
WebUI desktop, AnythingLLM, Msty, etc.) handles it — there's no
precedent we'd be breaking by not bundling.
Cons / blockers - Onboarding regression vs macOS and Linux. The "what is Ollama" gulp is real for mid-tech-literate users — the alpha cohort skews technical, but the beta cohort (#93 milestone label) is broader. - Two-step setup ("install Quill, then install Ollama") inflates the bounce rate during the install funnel. We have no data on this yet, but the macOS / Linux flows were explicitly designed around one-step setup. - The detection-fallback path is more UI work than the bundled path: we need a dedicated "polish disabled" state, a download CTA, and a re-probe trigger when the user comes back after installing.
C — Embedded llama.cpp instead (skip Ollama entirely on Windows)¶
Build the quill-polish crate's existing embedded-llama feature into
the Windows release. Ship a GGUF model alongside the .msi (or
download on first run, same flow as the Whisper STT model). No Ollama
subprocess, no HTTP server, no localhost port-binding.
The scaffolding already exists — crates/quill-polish/src/embedded.rs
implements the worker-thread LlamaBackend pattern behind the
embedded-llama Cargo feature, gated off by default because the
llama-cpp-2 transitive cmake build is heavy (~2 min cold) and most
users get Ollama-fast-enough.
Pros
- Smaller install footprint than option A — no separate Ollama tree,
no HTTP server, just the llama-cpp-2 static lib + the GGUF.
- Cleanest privacy story: no localhost server, no port-binding, no
third subprocess. The embedded.rs module header lists this as
the explicit motivation.
- Cross-platform consistency once we flip the feature on everywhere —
no more "Linux uses cpu_avx2 runner, macOS uses Metal, Windows
uses CUDA DLLs" matrix.
- No upstream auto-update conflict (because no upstream binary).
Cons / blockers
- Output quality gate resolved for closed alpha. This ADR predates
the Qwen3 chat-template path. Embedded polish now renders the GGUF's
native chat template, disables thinking output, stops on ChatML/EOS
markers, and falls back to raw on suspiciously long output. Windows
still needs packaging proof before it can depend on embedded polish,
but chat-template quality is no longer the blocker called out in the
original decision.
- Cancellation is unfinished. Per embedded.rs: "A pending
polish() doesn't yet observe the tokio cancellation signal."
- GPU acceleration on Windows isn't free. Ollama ships a curated
CUDA + ROCm + Vulkan matrix; we'd own the equivalent for
llama-cpp-2.
- Cross-platform commitment. Flipping embedded-llama on for
Windows but not for macOS / Linux means two polish backends to
maintain. Flipping it on everywhere is a much larger ask than the
Windows port itself.
Decision¶
Adopt Option A (bundle), with the Option B fallback path retained as a second-line defense.
Concretely:
1. The Windows .msi ships ollama.exe extracted from
ollama-windows-amd64.zip plus the minimum runner / GGML DLL set
needed to serve CPU inference. CUDA / ROCm / MLX variants are
not bundled in the v1 cut — users with GPUs install Ollama
themselves and select polish_backend=system, the same escape
hatch we offer Linux NVIDIA / AMD users today
(packaging/linux/package.sh:120-122).
2. find_bundled_ollama() gets a Windows-specific candidate path
alongside the existing macOS / Linux ones — the ollama.exe
branch at line 391 already covers the filename, but the location
list (lines 338-388) is Linux-/macOS-only and needs a sibling
bin\ollama.exe candidate matching wherever the .msi lands.
3. If the bundled binary is missing, polish_backend=auto fails
visibly instead of silently touching a user-managed Ollama. Users
who explicitly choose polish_backend=system get the system-Ollama
probe on :11434 via BackendConfig (PolishBackend::System arm,
crates/quill-polish/src/backend.rs:151-175).
4. If neither bundled nor system Ollama responds, the daemon surfaces
a structured "polish unavailable" state and the GUI shows the
Option-C-style "Install Ollama" CTA. Raw STT continues to work.
Option C (embedded-llama) stays on the roadmap as the eventual
replacement for all three platforms, but the chat-template and
cancellation work in embedded.rs are prerequisites, and treating
Windows as the forcing function for that migration would couple two
risky workstreams. Better to ship Windows on the same Ollama pipe the
other platforms run today and migrate everyone together later.
Consequences¶
What changes¶
packaging/windows/package.ps1(new — to be authored in a follow-up ticket) downloadsollama-windows-amd64.zip, extracts a curated subset, signs it alongsidequill-app.exe+quill-daemon.exe, and rolls everything into the.msivia WiX orcargo-wix.crates/quill-polish/src/backend.rs::bundled_candidates()gains a Windows arm pointing at the.msiinstall root.specs/polish/polish.spec.mdgets a Windows section codifying the bundled-binary-with-system-fallback contract (was tracked under the #142 "Tooling > Specs" bullet — that update lands with the implementation PR, not this decision doc).- The redistribution NOTICE in
packaging/grows an Ollama line (MIT attribution + a copy of upstream's LICENSE text inside the installed tree).
Work unlocked¶
- The Windows port (#106) gets a concrete polish-path target instead
of an open question. Beta milestone (
betalabel) can proceed without waiting on embedded-llama maturity.
Tickets to spin off¶
- Implement Windows packaging lane. New ticket for the
packaging/windows/package.ps1script + signed.msibuild. Block on this decision; unblocked by it. - Curate the Windows Ollama subset. Determine the minimum DLL
set inside
ollama-windows-amd64.zipfor CPU inference, validate that removing CUDA / cuDNN DLLs doesn't breakollama serve, document the curation so it's reproducible across Ollama version bumps. - Windows polish-disabled UI. The "no Ollama" CTA in the GUI for
the rare case where both the bundled and system paths fail.
Mostly UI work in
quill-app. - Spec the Windows polish path. Update
specs/polish/polish.spec.mdwith a Windows section + bump version. Lands with the implementation PR. - Resign + SmartScreen reputation test. One-shot manual test:
re-sign the upstream
ollama.exewith our CorvidLabs cert and confirm SmartScreen doesn't downgrade the signal. If it does, widen the scope of the packaging ticket to handle it.
What this decision explicitly defers¶
- The embedded-llama migration. It remains on the roadmap; the
chat-template work in
embedded.rsis the gating dependency, not the Windows port. - GPU acceleration on Windows. Power users opt in via
polish_backend=system+ a manual Ollama install, mirroring Linux today.