Skip to content

Make gradual the default memory reclaim mode#40780

Closed
benhillis wants to merge 4 commits into
microsoft:masterfrom
benhillis:benhill/mem-reclaim-4-default
Closed

Make gradual the default memory reclaim mode#40780
benhillis wants to merge 4 commits into
microsoft:masterfrom
benhillis:benhill/mem-reclaim-4-default

Conversation

@benhillis

Copy link
Copy Markdown
Member

Stacked PR 4 of 4 — builds on the full stack. Diff shown is cumulative and represents the complete change.

What this does

Now that gradual reclaim is pressure-driven and protects the working set with an adaptive floor, it reclaims more cold memory than the idle-gated drop_caches sledgehammer while being safe to run under load. This flips the default:

  • WSL2 VM mini-init: via the MemoryReclaim config default (WslCoreConfig.h) that flows to the guest.
  • WSLc container init: the hardcoded default in WSLCInit.cpp.

DropCache and Disabled remain available via [experimental] autoMemoryReclaim.

Stack

  1. Rework the memory reduction thread around explicit reclaim helpers #40777 — rework around explicit reclaim helpers
  2. drive gradual reclaim by memory pressure (PSI)
  3. adaptive working-set floor via refaults
  4. this PR — make gradual the default

Ben Hillis and others added 4 commits June 11, 2026 10:17
Replace the ring-buffer idle detector and user-CPU-only sampling in the
mini-init memory reduction thread with a clearer, helper-based design:

- Sample aggregate non-idle CPU time (user, system, irq, softirq, steal)
  so kernel-bound work keeps the VM out of the idle state, instead of
  looking at user time alone.
- GetReclaimableCacheBytes / GetFreeMemoryBytes read the relevant procfs
  counters via a small ReadProcFile helper.
- Gradual mode reclaims cold page cache (cgroup memory.reclaim) above a
  fixed floor while CPU-idle, with a hysteresis margin so it does not
  churn near the floor.
- DropCache mode stays gated on sustained CPU idle, drops once, and
  re-drops only after the reclaimable cache grows meaningfully.
- Compaction is gated on free-memory growth so it runs only when there
  are newly-freed pages worth coalescing.

All procfs writes are best-effort and never tear down the long-lived
reduction thread on a transient error.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
A CPU-bound workload can sit on gigabytes of cold page cache that a
CPU-idle check would never reclaim. Read the PSI "some avg10" memory
pressure from /proc/pressure/memory and reclaim cold cache toward the
fixed floor whenever pressure is low, even while the VM is busy, backing
off once the workload starts stalling on memory.

A busy interval reclaims at most a bounded step (c_gradualStepBusyBytes)
so a large backlog is drained gently; an idle interval drains the full
excess at once. When PSI is unavailable (kernel built without
CONFIG_PSI), gradual reclaim falls back to gating on CPU idle.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Reclaiming toward a single fixed floor either gives back too little on
large VMs or evicts pages a larger working set immediately faults back
in. Make the floor adaptive and self-regulating:

- Track file refaults (/proc/vmstat workingset_refault*) as a signal
  that reclaim is cutting into the working set. When refaults spike (or
  PSI pressure rises into the backoff band), raise the floor to protect
  what the workload is actually using and stop reclaiming.
- After sustained calm, decay the floor back toward the base so a
  shrunken working set is eventually re-probed downward.
- Scale the floor's upper bound to a fraction of guest RAM so large
  working sets on large VMs can be fully protected, falling back to a
  fixed cap when total RAM is unavailable.

The PSI-unavailable path keeps the same refault brake and floor decay so
behavior degrades gracefully on kernels without CONFIG_PSI.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Now that gradual reclaim is pressure-driven and protects the working set
with an adaptive floor, it reclaims more cold memory than the
idle-gated drop_caches sledgehammer while being safe to run under load.
Make it the default for both the WSL2 VM mini-init (via the config
default that flows to the guest) and the WSLc container init.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings June 11, 2026 17:22

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR makes Gradual the default memory reclaim mode for both WSL2 VMs and WSLc containers, aligning defaults with the newer pressure-driven gradual reclaim behavior in the Linux mini-init memory reduction thread.

Changes:

  • Flip the WSL2 guest default MemoryReclaim config from DropCache to Gradual (WslCoreConfig.h).
  • Flip the WSLc container init default to start the memory reduction thread in Gradual mode (WSLCInit.cpp).
  • Update/refactor the Linux init memory reduction logic to support pressure-driven gradual reclaim with an adaptive floor (util.cpp, cumulative stack diff).

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File Description
src/windows/common/WslCoreConfig.h Changes the default VM MemoryReclaim mode to Gradual.
src/linux/init/WSLCInit.cpp Starts the container memory reduction thread in Gradual mode by default.
src/linux/init/util.cpp Implements/refines the gradual reclaim policy and supporting procfs parsing used by the reduction thread.

Comment thread src/linux/init/util.cpp
Comment on lines 3538 to +3539
{
wil::unique_fd Fd{open("/proc/stat", O_RDONLY)};
wil::unique_fd Fd{open(Path, O_RDONLY)};
Comment thread src/linux/init/util.cpp
Comment on lines +4053 to +4057
// Best-effort: WriteToFile logs internally on failure. EAGAIN merely means the kernel could
// not evict the full amount this pass (pages were still freed), so it counts as reclaim.
// Never throw on a transient write error and tear down the long-lived reduction thread.
const int Status = WriteToFile(RECLAIM_PATH, Bytes.c_str());
Reclaimed = (Status == 0) || (errno == EAGAIN);
@benhillis benhillis closed this Jun 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants