Skip to content

Expose remaining cgroup v2 memory.events counters (oom, oom_kill, oom_group_kill, low)#3889

Open
note89 wants to merge 3 commits into
google:masterfrom
note89:note89/expose-memory-events-oom
Open

Expose remaining cgroup v2 memory.events counters (oom, oom_kill, oom_group_kill, low)#3889
note89 wants to merge 3 commits into
google:masterfrom
note89:note89/expose-memory-events-oom

Conversation

@note89

@note89 note89 commented Jun 11, 2026

Copy link
Copy Markdown

What:

Follow-up to #3870. Expose the remaining cgroup v2 memory.events counters as Prometheus metrics:

  • container_memory_events_oom_total
  • container_memory_events_oom_kill_total
  • container_memory_events_oom_group_kill_total
  • container_memory_events_low_total (added on request from @egorikas)

With these, all six memory.events counters (low, high, max, oom, oom_kill, oom_group_kill) are exposed.

Why:

memory.events already contains these counters on cgroup v2 systems. Exposing them gives operators direct cgroup-level visibility into allocation failures, OOM kills, and memory.low protection breaches. This is separate from the existing container_oom_events_total metric, which comes from the OOM event path rather than the cgroup v2 memory.events counters.

How:

  • Read low, oom, oom_kill, and oom_group_kill in the existing setMemoryEvents path.
  • Add the fields to info/v1.MemoryEvents and emit them under the existing memory metric set.
  • Update unit tests, integration metric checks, docs, and Prometheus golden files.

@google-cla

google-cla Bot commented Jun 11, 2026

Copy link
Copy Markdown

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

Surface the low counter from memory.events alongside the OOM counters,
as container_memory_events_low_total. Requested in PR feedback.
@note89 note89 changed the title Expose cgroup v2 memory.events OOM metrics Expose remaining cgroup v2 memory.events counters (oom, oom_kill, oom_group_kill, low) Jun 11, 2026
@note89

note89 commented Jun 11, 2026

Copy link
Copy Markdown
Author

@egorikas I saw your comment, added low, tnx!

@egorikas

Copy link
Copy Markdown
Contributor

@egorikas I saw your comment, added low, tnx!

Oops, I've somehow deleted it :) Thanks for adding a metric per my request :)

Align metric descriptions with kernel cgroup v2 semantics:

- low counts reclaim despite usage being under the memory.low
  boundary, so "reclaim events" instead of "breach events".
- oom counts entering the OOM state, not page allocation faults;
  the previous wording was nearly identical to the existing
  container_memory_failures_total help text.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants