Out-of-process plugin host via threaded callback pump (alternative to #40120)#40769
Draft
benhillis wants to merge 3 commits into
Draft
Out-of-process plugin host via threaded callback pump (alternative to #40120)#40769benhillis wants to merge 3 commits into
benhillis wants to merge 3 commits into
Conversation
Contributor
There was a problem hiding this comment.
Pull request overview
This PR implements an alternative callback re-entrancy strategy for the out-of-process WSL plugin host design: instead of adding a second callback lock, it introduces a threaded notification + callback “pump” so plugin callbacks are executed back on the original notifying thread (preserving the existing single m_instanceLock re-entrancy model).
Changes:
- Add a new out-of-process COM local server
wslpluginhost.exeimplementingIWslPluginHostand forwarding plugin API callbacks back to the service viaIWslPluginHostCallback. - Refactor
PluginManagerto activate per-plugin hosts via COM (GIT marshaling), run outbound notifications throughPluginCallPump, and route inbound callbacks via the pump or a timed direct-lock path to avoid deadlocks. - Expand/adjust Windows tests and installer/packaging to include
wslpluginhost.exeand validate host-crash isolation + concurrency/callback scenarios.
Reviewed changes
Copilot reviewed 32 out of 32 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| test/windows/testplugin/Plugin.cpp | Extends test plugin to exercise host-crash, concurrent callbacks, async worker callbacks, and teardown races. |
| test/windows/PluginTests.h | Adds new PluginTestType cases for out-of-proc host scenarios. |
| test/windows/PluginTests.cpp | Adds/updates tests to validate host crash behavior, concurrent callbacks, async worker callbacks, and teardown-race stability. |
| test/windows/InstallerTests.cpp | Ensures installed executable set includes wslpluginhost.exe. |
| test/windows/Common.h | Adds wslpluginhost.exe to test “BinaryUnderTest” list; declares service PID helpers. |
| test/windows/Common.cpp | Implements GetWslServicePid() and GetWslServiceRunningPid() to validate service survival without silent restarts. |
| src/windows/wslpluginhost/exe/resource.h | Adds resource declarations for the new plugin host executable. |
| src/windows/wslpluginhost/exe/PluginHost.h | Declares the COM host class that loads plugin DLLs and forwards callbacks to the service. |
| src/windows/wslpluginhost/exe/PluginHost.cpp | Implements plugin host initialization, hook dispatch, and service-callback stubs (including COM init on worker threads). |
| src/windows/wslpluginhost/exe/main.rc | Adds version/icon resources for wslpluginhost.exe. |
| src/windows/wslpluginhost/exe/main.cpp | Implements COM local server lifecycle (factory registration, mitigation policies, startup timeout). |
| src/windows/wslpluginhost/exe/CMakeLists.txt | Builds the new wslpluginhost target and links required libraries/headers. |
| src/windows/wslpluginhost/CMakeLists.txt | Adds plugin host subdirectory to the build. |
| src/windows/wslinstall/DllMain.cpp | Registers wslpluginhost.exe in LSP categories list. |
| src/windows/service/stub/CMakeLists.txt | Adds generated proxy/stub sources for WslPluginHost.idl into wslserviceproxystub.dll. |
| src/windows/service/inc/WslPluginHost.idl | Introduces COM interfaces IWslPluginHost and IWslPluginHostCallback plus CLSID. |
| src/windows/service/inc/CMakeLists.txt | Adds wslpluginhostidl generation. |
| src/windows/service/exe/WSLCSessionManager.h | Reworks session iteration helpers to avoid holding session lock across out-of-proc plugin notifications. |
| src/windows/service/exe/WSLCSessionManager.cpp | Moves stopping notifications out from under session lock; adds rollback/race handling around OnWslcSessionCreated. |
| src/windows/service/exe/ServiceMain.cpp | Calls g_pluginManager.Shutdown() before CoUninitialize to avoid proxy teardown after COM shutdown. |
| src/windows/service/exe/PluginManager.h | Major refactor for out-of-proc hosts, GIT proxying, job object ownership, WSLC session ref-map, and pump routing. |
| src/windows/service/exe/PluginManager.cpp | Implements out-of-proc activation, host-crash latching, pump-driven notifications, and callback routing logic. |
| src/windows/service/exe/PluginCallPump.h | Adds the pump primitive for running notifications on a worker while executing callbacks on the notifying thread. |
| src/windows/service/exe/PluginCallPump.cpp | Implements pump queueing, worker coordination, and safe shutdown semantics. |
| src/windows/service/exe/LxssUserSession.h | Adds TryInvokeUnderInstanceLock to support timed direct-callback execution without deadlocking. |
| src/windows/service/exe/LxssUserSession.cpp | Implements TryInvokeUnderInstanceLock. |
| src/windows/service/exe/CMakeLists.txt | Adds PluginCallPump and wslpluginhostidl dependency to wslservice. |
| src/windows/common/precomp.h | Adds <shared_mutex> include to shared PCH. |
| msipackage/package.wix.in | Installs wslpluginhost.exe and registers its COM/AppID/Interface entries. |
| msipackage/CMakeLists.txt | Adds wslpluginhost.exe to MSI packaging inputs and dependencies. |
| CMakeLists.txt | Adds src/windows/wslpluginhost to the top-level build. |
| .pipelines/build-stage.yml | Adds wslpluginhost to CI build targets/artifact patterns. |
deedac2 to
67d56d6
Compare
WSL plugin DLLs are moved out of wslservice.exe into a separate wslpluginhost.exe COM server so plugin code can no longer crash or destabilize the service. Each plugin is activated in its own host process (CLSCTX_LOCAL_SERVER, SYSTEM-only via AppID) and reached through a versioned COM interface defined in WslPluginHost.idl. All hosts are tied to a service-owned job object and terminate when wslservice exits. The plugin API is unchanged; existing plugins run unmodified. A crashing or disconnected host is classified by IsHostCrash (RPC_E_DISCONNECTED, RPC_E_SERVER_DIED[_DNE], CO_E_OBJNOTCONNECTED, RPC_S_SERVER_UNAVAILABLE, RPC_S_CALL_FAILED[_DNE]); the service logs it and continues instead of treating it as a fatal plugin error. RPC_E_CALL_REJECTED is intentionally excluded as a transient busy state rather than a dead host. Plugin->service callbacks (MountFolder, ExecuteBinary, and the WSLC session APIs) arrive on a different COM thread than the outbound hook, so they cannot re-enter the lock held during the hook: - VM path: LxssUserSessionImpl guards callbacks with a shared_mutex (shared for callbacks, exclusive in _VmTerminate after OnVmStopping drains in-flight callbacks before the utility VM is destroyed). - WSLC path: PluginManager resolves sessions through its own reference map under a dedicated lock, and WSLCSessionManager releases its session lock before any plugin notification fires, so callbacks never re-enter the session lock. A session is registered in the reference map but not published until OnWslcSessionCreated succeeds, so a vetoed or race-lost session is never handed out. Proxy/stub is consolidated into wslserviceproxystub.dll. One new exe, no new DLLs. Tests - HostCrashIsolation: kills wslpluginhost.exe mid-OnVmStarted and verifies the service survives and m_initOnce stays sticky. - ConcurrentCallbacks: four plugin threads hammer MountFolder and ExecuteBinary, exercising the shared callback lock. - AsyncApiCallFromWorker: a plugin worker thread calls into the service post-hook (cross-apartment, non-COM-initialized). - CallbacksDuringTerminationDoNotCrash: worker threads race _VmTerminate's exclusive lock and VM teardown, then wind down deterministically after OnVmStopping signals them and are joined on the next session start. - Existing WSL1 plugin tests broadened alongside the refactor. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
67d56d6 to
22ab719
Compare
- Pass the kill-on-close job object into IWslPluginHost::Initialize and drop GetProcessHandle so the host assigns itself to the job before running any plugin code. - Return an IWSLCProcess reference from WslcCreateProcess instead of opaque process cookies; remove the cookie bookkeeping methods. - Make plugin host crashes fatal: a crash during a start/veto hook (or at load) blocks the WSL operation with a recorded, plugin-named error, matching pre-refactor behavior. Latch the first crash so subsequent operations fail fast with one consistent error instead of repeatedly driving a dead host (m_pluginError guarded by m_pluginErrorLock). Teardown hooks latch but stay non-fatal. Re-activation/non-fatal resilience is a follow-up. - Drop the sidData empty-guards so WSL and WSLC hooks pass the serialized SID consistently. - Rework the HostCrash test to assert the new fatal behavior. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…rocess plugin host Alternative to the m_callbackLock design in PR #40120, per @OneBlue's review. Instead of guarding plugin callbacks with a new shared_mutex (m_callbackLock) and a session-ref registry, this runs each outbound notification (host->On...) on a worker thread and pumps the plugin's service-side API calls back onto the original notifying thread, which already holds the session's recursive m_instanceLock. This reproduces in-process re-entrancy and lets us delete the synchronization the PR added. - New PluginCallPump primitive (Run/Invoke): the worker makes the COM notification call; the notifying thread pumps queued callback work and runs it under m_instanceLock. - PluginManager: WSL notifications (OnVm*/OnDistribution*) route via RunHostNotification; WSL callbacks (MountFolder/ExecuteBinary[InDistribution]) route via a hybrid InvokeOnWslPump - pump when a hook is in flight, direct (own m_instanceLock) when not, preserving async out-of-hook callbacks. - Revert m_callbackLock in LxssUserSessionImpl; MountRootNamespaceFolder/CreateLinuxProcess go back to the recursive m_instanceLock. Add LxssUserSessionImpl::TryInvokeUnderInstanceLock (timed try_lock_for on m_instanceLock) used by the direct path. - Deadlock fix: a callback racing the pre-notification window (lock held, pump not yet registered) must not block on m_instanceLock - InvokeOnWslPump uses a timed acquire and re-checks for a pump, so the racing callback is routed onto the pump instead of dead-blocking. - Hardening (from multi-model review): PluginCallPump::Invoke reports ran-vs-stopped out-of-band (no RPC_E_DISCONNECTED sentinel overload, avoids double-execution); the pump loop polls workerDone each iteration so callback traffic cannot starve the stop path; Run is exception-safe so std::thread creation failure never escapes into teardown hooks. - WSLC session-ref registry left as-is (documented follow-up): its lock is non-recursive, notifications already fire outside it, and it is entangled with the create-veto protocol. - Update test/plugin comments to describe the pump model (assertions unchanged and still pass). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
22ab719 to
cdb7b18
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
WSL plugin DLLs are moved out of
wslservice.exeinto a separatewslpluginhost.exeCOM server (same as #40120). The difference is how plugin callback re-entrancy is handled while the service is blocked in an outbound notification:std::shared_mutex m_callbackLock) plus a session-ref registry, with a dual-lock discipline (m_instanceLock+m_callbackLock) spread across callback-reachable session state.host->On...) on a worker thread and pumps the plugin's service-side API calls back onto the original notifying thread, which already holds the session's recursivem_instanceLock. This reproduces the in-process re-entrancy model with a single lock — nom_callbackLock, no new dual-lock annotations.Trade-off: the pump is ~+195 net code lines (a new
PluginCallPumpprimitive + per-notification wiring) but keeps the single-lock model the rest of the session already assumes.PR Checklist
PluginTestspass on a real host: 30 passed, 0 failed, 1 skipped (signed-only)Detailed Description
PluginCallPump (
src/windows/service/exe/PluginCallPump.{h,cpp})Run(notification)spawns a worker thread that makes the outbound COM call; the calling (notifying) thread pumps queuedInvoke(work)calls and runs them underm_instanceLock, so recursive locks re-enter exactly as in-process.Invokereports ran-vs-stopped out-of-band (no HRESULT sentinel), is exception-safe, and cannot strand an RPC thread.Routing (
PluginManager.cpp)OnVm*,OnDistribution*) route viaRunHostNotification, which registers a per-session pump for the duration of the call.MountFolder,ExecuteBinary[InDistribution]) route viaInvokeOnWslPump: pump when a hook is in flight, direct (ownm_instanceLock) otherwise — preserving async out-of-hook callbacks.Deadlock fix
A callback racing the pre-notification window (lock held, pump not yet registered) must not block on
m_instanceLock.InvokeOnWslPumpuses a timed acquire (TryInvokeUnderInstanceLock) and re-checks for a pump, routing the racing callback onto the pump instead of dead-blocking.Scope note
The WSLC session-ref registry is intentionally left as-is (its lock is non-recursive, notifications already fire outside it, and it is entangled with the create-veto protocol). Documented as a follow-up.
Validation Steps
Result:
Total=31, Passed=30, Failed=0, Skipped=1(skip = signed-build-only signature test). Includes the previously-deadlockingCallbacksDuringTerminationDoNotCrash.