feat(platform): CBM_WORKERS env override for cbm_default_worker_count by yangsec888 · Pull Request #364 · DeusData/codebase-memory-mcp

yangsec888 · 2026-05-19T14:03:06Z

What

Adds a single env-knob, CBM_WORKERS, that explicitly sets the
worker count returned by cbm_default_worker_count(). When unset,
behaviour is unchanged.

Why

In containerized deployments, cbm_default_worker_count(initial=true)
returns sysconf(_SC_NPROCESSORS_ONLN) (host-CPU count, not the
container's CPU quota). On a 1-vCPU pod scheduled onto a 16-core
node cbm spawns ~16 indexing workers, each with its own per-worker
buffers — the dominant OOMKill driver in container deployments
(see companion issue #363 for the broader cgroup-awareness ask).

A small env knob solves the immediate operator pain without taking
on the scope of full cgroup detection, and it follows the same
shape as the existing CBM_* env overrides: explicit override >
implicit detection.

Patch

src/foundation/system_info.c:

int cbm_default_worker_count(bool initial) {
    /* CBM_WORKERS env override (clamped to [1, CBM_WORKERS_MAX]).
     * Useful inside containers where sysconf(_SC_NPROCESSORS_ONLN)
     * reports host CPUs rather than the cgroup's effective CPU quota.
     * Same precedence shape as other CBM_* env overrides:
     * explicit override > implicit detection. */
    char buf[CBM_SZ_32];
    if (cbm_safe_getenv("CBM_WORKERS", buf, sizeof(buf), NULL) != NULL) {
        long n = strtol(buf, NULL, CBM_DECIMAL_BASE);
        if (n >= MIN_WORKERS && n <= CBM_WORKERS_MAX) {
            return (int)n;
        }
        cbm_log_warn("workers.env.invalid", "value", buf,
                     "fallback", "sysconf");
    }

    cbm_system_info_t info = cbm_system_info();
    // ... unchanged from today ...
}

CBM_WORKERS_MAX = 256 is added to the file-local enum; happy to
drop the upper clamp if you'd rather trust the operator, or move
the constant somewhere shared. cbm_safe_getenv, CBM_DECIMAL_BASE,
CBM_SZ_32, SKIP_ONE, MIN_WORKERS, and cbm_log_warn all
already exist (see src/foundation/platform.h, constants.h, and
log.h).

Tests

tests/test_platform.c gains three cases inside the existing
SUITE(platform):

platform_default_workers_env_override — CBM_WORKERS=4 is honored for both initial=true and initial=false.
platform_default_workers_env_invalid — 0, -1, 9999, and not-a-number all fall back to the sysconf-derived default (and emit workers.env.invalid at WARN).
platform_default_workers_env_unset — when unset, cbm_default_worker_count(true) == cbm_system_info().total_cores, preserving today's contract.

Docs

One-line addition to the Environment Variables table in
README.md:

CBM_WORKERS — Override the parallel-indexing worker count
returned by cbm_default_worker_count. Useful inside containers
where sysconf(_SC_NPROCESSORS_ONLN) reports host CPUs rather
than the cgroup's effective quota. Range 1–256; invalid values
are ignored with a warning.

Behaviour when unset

Identical to today. No deployment that doesn't set CBM_WORKERS
sees any difference.

Verification on my side

I have not run scripts/build.sh / scripts/test.sh /
scripts/lint.sh locally because the full ASan + UBSan setup
isn't installed on my machine. Happy to iterate quickly on any
CI feedback (lint, clang-tidy, cppcheck, format) — the patch is
small enough that a turnaround should be fast.

Companion issue Linux: cbm_system_info / cbm_default_worker_count don't respect cgroup CPU/memory limits #363 — full cgroup-aware detection (the broader
ask this PR carves a smaller, lower-risk path out of).
Other crash bugs that container-overspawn magnifies: Dump-phase crash on large TS monorepo (v0.6.1, darwin-arm64) — pipeline completes through semantic_edges then terminates abnormally before gbuf.dump #317, Silent index corruption after rapid kill/restart cycles #334,
Segmentation fault (core dumped) #336, Segmentation fault at start of lsp_cross pass on repositories with 200+ files #340.
Downstream context: filed while integrating CBM as a long-lived
MCP-server process inside a 2 GiB Kubernetes pod. Backend pod's
current workaround is a memory-limit bump from 2 GiB → 4 GiB to
ride out the overspawn until this lands.

Adds a single env-knob, CBM_WORKERS, that explicitly sets the worker count returned by cbm_default_worker_count(). When unset, behaviour is unchanged. In containerized deployments, cbm_default_worker_count(initial=true) returns sysconf(_SC_NPROCESSORS_ONLN) (host-CPU count, not the container's CPU quota). On a 1-vCPU pod scheduled onto a 16-core node cbm spawns ~16 indexing workers, each with its own per-worker buffers — the dominant OOMKill driver in container deployments. The broader ask (full cgroup awareness for cbm_system_info / mem.init budget logs) lives in a sibling issue; this knob is the smaller, lower-risk path that ships independently. Same precedence shape as the existing CBM_* env overrides: explicit override > implicit detection. Range clamped to [1, 256]; invalid values are ignored and the function logs `workers.env.invalid` at WARN before falling through to the sysconf-derived default. Files touched: - src/foundation/system_info.c — env probe + clamp + warn-and-fall-through - tests/test_platform.c — 3 new TEST cases covering override, invalid fallback, and unset baseline - README.md — CBM_WORKERS row in the Environment Variables table Related: cgroup-aware detection issue (filed alongside this PR).

DeusData · 2026-05-30T15:47:20Z

Thank you, @yangsec888! 🙏 The CBM_WORKERS escape hatch is a clean, well-scoped fix for the container over-provisioning problem — cbm_safe_getenv + strtol + clamp to [1,256] with a warn-and-fall-back on invalid values is exactly the right shape, and it matches the precedence of the other CBM_* overrides. Your three tests (override / invalid / unset) cover it well.

I landed it as d952238, crediting you as the commit author (it sequences right before your cgroup-detection work). Verified locally: build clean, all tests pass and the workers.env.invalid warnings fire as expected. Thanks for tackling the container OOMKill story! 🙏

Adds a CBM_WORKERS env knob (clamped to [1, 256], invalid values warned and ignored) that explicitly sets the parallel-indexing worker count. In containers, sysconf(_SC_NPROCESSORS_ONLN) reports host CPUs rather than the cgroup's effective quota, so a 1-vCPU pod on a 16-core node otherwise spawns ~16 workers — the dominant OOMKill driver. When unset, behaviour is unchanged. From #364.

Inside a container, sysconf(_SC_NPROCESSORS_ONLN) and sysinfo() report the host's CPU count and RAM, not the cgroup's effective quota — so cbm_default_worker_count over-provisions workers and the mmap budget can exceed the cgroup memory cap, driving OOMKills (#363). detect_system_linux now reads the cgroup limits: - cbm_detect_cgroup_cpus: cgroup v2 cpu.max, then v1 cpu.cfs_quota_us / cpu.cfs_period_us → ceil(quota/period); -1 when unlimited. - cbm_detect_cgroup_mem: cgroup v2 memory.max, then v1 memory.limit_in_bytes; 0 when unlimited or the v1 near-ULLONG_MAX sentinel. - Effective value is min(cgroup, host), guarding against mis-mounted cgroups that report more than the host. Helpers read via a bounded read_small_file (fopen "re", capped fread) and are exposed through system_info_internal.h for unit tests that drive them against a fake cgroup tree. The test teardown uses opendir/unlink/ rmdir (no shell spawn). Distilled from #365 onto current main (unioned the test additions with the CBM_WORKERS tests from #364, and replaced the test-cleanup system("rm -rf") with a shell-free recursive remove). Closes #363 together with #364.

yangsec888 force-pushed the feat/cbm-workers-env-override branch from 3e1f80b to 80be86f Compare May 19, 2026 14:39

yangsec888 mentioned this pull request May 19, 2026

feat(platform): cgroup-aware CPU/memory detection in detect_system_linux #365

Closed

DeusData closed this May 30, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(platform): CBM_WORKERS env override for cbm_default_worker_count#364

feat(platform): CBM_WORKERS env override for cbm_default_worker_count#364
yangsec888 wants to merge 1 commit into
DeusData:mainfrom
yangsec888:feat/cbm-workers-env-override

yangsec888 commented May 19, 2026

Uh oh!

DeusData commented May 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

yangsec888 commented May 19, 2026

What

Why

Patch

Tests

Docs

Behaviour when unset

Verification on my side

Related

Uh oh!

DeusData commented May 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants