Skip to content

Add native WSL2 VM backend for Windows#667

Closed
qwatts-dev wants to merge 8 commits intoroots:masterfrom
qwatts-dev:feature/wsl-backend
Closed

Add native WSL2 VM backend for Windows#667
qwatts-dev wants to merge 8 commits intoroots:masterfrom
qwatts-dev:feature/wsl-backend

Conversation

@qwatts-dev
Copy link
Copy Markdown

Replaces #665 (moved to a branch per @retlehs' request so maintainers can make direct edits).

Summary

Adds a wsl VM backend so Windows users get a native trellis vm experience using WSL2 — no nested VMs, no Vagrant, no VirtualBox.

This mirrors the Lima backend's role on macOS/Linux: each Trellis project gets its own isolated Ubuntu 24.04 environment with project files on ext4, Ansible running locally inside the distro, and ports accessible directly from Windows.

Discourse thread: https://discourse.roots.io/t/native-wsl2-vm-backend-for-trellis-on-windows-looking-for-testers/30281

Motivation

Windows users currently have two options, both painful:

  1. Lima inside WSL2 — QEMU nested virtualization is slow (~14s TTFB on DrvFS), VMs break when WSL sleeps, and port forwarding is fragile.
  2. Manual WSL setup — No trellis vm integration, no auto-provisioning, no config sync.

WSL2 is already a VM — this backend uses it directly instead of nesting another VM inside it.

What's new

New files

File Purpose
pkg/wsl/manager.go Core vm.Manager implementation (~1150 lines)
pkg/wsl/hosts.go Windows hosts file management with UAC elevation
pkg/wsl/ubuntu.go Ubuntu rootfs URL registry (22.04, 24.04)
cmd/vm_open.go Opens VS Code via --folder-uri vscode-remote://wsl+<distro>/path
cmd/vm_sync.go Manual WSL→Windows rsync sync
cmd/vm_trust.go Re-imports SSL certs into Windows trust store

Modified files

File Change
cmd/vm.go case "wsl" in newVmManager() + two guard functions
cmd/vm_start.go WSL bootstrap/provision flow, unprovisioned distro cleanup with confirmation prompt
cmd/vm_stop.go Auto SyncBack before stop
cmd/vm_delete.go windowsHostRequired() guard
cmd/vm_shell.go windowsHostRequired() guard
cmd/init.go WSL guard — explains dependencies are managed inside the VM
trellis/trellis.go VmManagerType() returns "wsl" on Windows, WSL auto-detection, CheckVirtualenv skip
cmd/db_open.go Direct mysql:// URI for WSL (no SSH tunnel needed)
pkg/db_opener/tableplus.go rundll32.exe URI opening for Windows
github/main.go Retry loop for os.Rename (Windows antivirus file locks)
cmd/provision.go, cmd/deploy.go, etc. wslTerminalRequired() guard on Ansible commands
main.go Register new vm open, vm sync, vm trust commands

Design decisions

  • Follows the Lima pattern. The wsl.Manager implements vm.Manager identically to lima.Manager. All WSL-specific code lives in pkg/wsl/ — no Windows logic scattered elsewhere.
  • Auto-detected. VmManagerType() returns "wsl" when runtime.GOOS == "windows" and the manager is "auto". No user configuration needed.
  • Project isolation. Each project gets its own WSL2 distro (named trellis-<site>). Projects are rsync'd to ext4 at /home/admin/<project>/.
  • No SSH. Uses ansible_connection=local with ansible_user=admin. No SSH keys, no tunnels.
  • Two guard functions keep users on the right track:
    • wslTerminalRequired() — redirects Ansible commands from Windows → "run trellis vm open first"
    • windowsHostRequired() — redirects VM management from WSL → "run trellis <command> from Windows PowerShell"
  • One project at a time. WSL2 distros share a network namespace (Microsoft by-design). StartInstance prompts to sync and stop other running trellis-* distros.
  • Config sync. syncConfigFromWSL() rsyncs group_vars/ from ext4→Windows on manager init, keeping the Windows-side repo current.
  • Upstream-ready CLI install. Bootstrap installs trellis inside the distro via the official install script (scripts/get). A dev override checks for a local cross-compiled binary first (for testing from source before a release exists).

Changes since #665

  • Added WSL guard to trellis init — prints a clear message instead of failing with a virtualenv error
  • Improved windowsHostRequired() guard to echo back the actual command (e.g. "Run 'trellis vm start' from Windows PowerShell")
  • Added windowsHostRequired() guard to vm open and vm sync (were showing confusing "only supported on Windows" when run from WSL)
  • Made CLI install in bootstrap upstream-ready — sidecar first (dev), falls back to scripts/get (upstream releases)
  • Fixed vm start silently deleting provisioned distros that predate the marker file system (two-tier provisioned check with confirmation prompt)

Testing

Tested on Windows 11 with WSL2:

  • Fresh project (trellis new) — full bootstrap + provision + site loads ✅ (also verified by @retlehs in Add native WSL2 VM backend for Windows #665)
  • Existing production site (Sage theme, ACF Pro, restored database)
  • vm start / stop / shell / open / delete lifecycle
  • provision, deploy, db open commands
  • SSL cert trust import
  • Config sync (group_vars roundtrip)
  • Multiple distro switching (auto stop/sync of other projects)
  • trellis init guard message on Windows ✅
  • Guard messages from inside WSL terminal ✅

Checklist

  • go vet ./... passes
  • golangci-lint run passes (0 issues)
  • Follows existing code patterns (command package for exec, color for output, promptui for prompts)
  • No changes to Lima backend behavior
  • macOS/Linux codepaths unaffected (WSL code gated behind runtime.GOOS == "windows" or WSL_DISTRO_NAME checks)

The upstream trellis-cli supports Lima for macOS/Linux local development.
This adds a WSL2 backend so Windows developers get the same first-class
experience via `trellis vm start`.

New WSL2 backend (pkg/wsl/):
- Manager implementing vm.Manager using wsl.exe commands
- WindowsHostsResolver for hosts file management with UAC elevation
- Ubuntu rootfs registry (22.04, 24.04)
- Bootstrap installs Python, Ansible, Node.js LTS, Corepack
- Project files copied to ext4 for native performance (~80ms vs ~14s TTFB)
- Auto-stops other trellis distros (shared network namespace)
- SyncBack prompt before stopping other running distros
- Breadcrumb file for cross-distro SyncBack support

New commands:
- vm open: Launch VS Code connected to WSL distro
- vm sync: Manual WSL-to-Windows file sync
- vm trust: Re-import self-signed SSL certs into Windows trust store

Enhanced existing commands:
- vm start/stop/delete/shell: WSL2 backend support
- db open: Works from both Windows and WSL terminals
- provision, deploy, vault, galaxy, xdebug-tunnel: Windows host
  detection with redirect to WSL terminal

Other changes:
- Windows os.Rename retry loop for antivirus file locks
- rundll32 URI handler (fixes cmd.exe & parsing in URIs)
- UTF-16LE decoder for wsl.exe output
…backend

feat: Add native WSL2 virtual machine backend for Windows
The isProvisioned check relied solely on an external .provisioned marker file. Distros provisioned before the marker system was introduced (or whose marker was lost) were incorrectly identified as unprovisioned and silently deleted on the next vm start.

Changes:

- isProvisioned() now has a two-tier check: marker file first, then falls back to checking /etc/trellis-project-root (breadcrumb written during bootstrap) inside the distro. Self-heals the marker on success.

- vm start now prompts for confirmation before deleting a distro that appears unprovisioned, instead of silently deleting it.
…fety

Fix vm start silently deleting provisioned WSL distros
…stall

- Add WSL guard to 'trellis init': detects WSL backend and prints a
  message explaining that dependencies are managed inside the VM
  automatically, instead of failing with a virtualenv error.

- Improve windowsHostRequired() guard message: echoes back the actual
  command to run (e.g. "Run 'trellis vm start' from Windows PowerShell")
  instead of the generic "Run this command from PowerShell".

- Add windowsHostRequired() guard to 'vm open' and 'vm sync': users
  running these from inside WSL now get the correct "run from PowerShell"
  message instead of a confusing "only supported on Windows (WSL2)" error.

- Make CLI install in bootstrap upstream-ready: checks for a cross-compiled
  trellis-linux sidecar first (dev/fork builds), falls back to the official
  install script (scripts/get) for upstream releases. Previously, if no
  sidecar was found, the distro silently had no CLI binary.
Improve WSL UX: init guard, guard messages, and upstream-ready CLI install
@swalkinshaw
Copy link
Copy Markdown
Member

@qwatts-dev thanks for your contribution here; this is pretty cool to see. However, it's not something we can officially support and integrate into trellis-cli. The problem with this model of Windows support is it effectively creates an entirely separate mode of execution compare to what exists now with Ansible running on the host. This would either require a complete refactor or conditionals in almost every single command like this PR has now. It would just create a much more complex codebase and harder to maintain and test.

We'd be interested in exploring QEMU nested virtualization more and see if there's any way to improve performance there. If that was usable then it would only require minimal changes to officially support it.

@qwatts-dev
Copy link
Copy Markdown
Author

Thanks for the thorough explanation @swalkinshaw! I genuinely appreciate it, and the reasoning makes total sense. As I was setting the conditionals, I was thinking "man this will change the experience for Windows devs a LOT", haha. So, I knew that would be the awkward part of this PR.

For the QEMU nested virtualization direction - funny timing! I actually built a working QEMU on WSL bash shim for my team before attempting this native WSL path (talked about it a bit in my Roots Discourse post).

Just a heads up as you guys explore that route for Windows users: the biggest hurdle we ran into was that Windows browsers can't natively route to the internal network that QEMU sets up inside WSL2. My shim (about 1,500 lines of bash) hacked around this using SSH tunnels and port forwarding to expose the VM to Windows, plus some UAC prompts to manage the Windows hosts file and SSL cert trust (similar to how this fork/PR handled SSL).

It works, but that heavy complexity is actually what pushed me toward trying this native WSL path instead.

All in all, it's been an awesome experience contributing! As y'all explore QEMU nested virtualizations, I'm happy to dump my existing bash scripts and WSL network notes here or in an issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants