Worker managed cpu utilization by Dym03 · Pull Request #1073 · It4innovations/hyperqueue

Dym03 · 2026-03-07T21:19:36Z

This PR adds a option to show utilization with distinguished cpus that are used by running pinned tasks that are assigned to a worker. Without the pinning the feature doesn't work well now.
Currently the change of color scheme is used to distinguish the cpus. But maybe the graying out of the other cpus might be better.

Keybindings added is 'c' that enables the user to toggle between global and worker specific cpu usage.

Features that might be connected with this:

Add this cpu info into the hq worker info command
If multiple tasks are runned, there could be a specific mode to filter the running tasks further to show the cpus only for specific task/job
Update the cpu utilization history, to work with the global / worker specific information

This is part of the feature requested in #1016

Kobzol · 2026-03-10T10:30:18Z

Hi, thank you for the PR! I haven't checked the code in detail yet, but at a glance it looks fine.

I'll answer the questions from #1016 (comment) here:

I don't think we need to deal with pinning in any special way. One of the use-cases for this feature is actually to let users (visually) know that their programs might not be using the right cores. If they don't use pinning properly, they will see that the CPUs being utilized do not correspond to the ones assigned to the task. So it is more of a "detect that I do pinning wrong" feature rather than "only show this if pinning is enabled" feature. The visualization not corresponding to task utilization is a feature, not a bug :)
I don't think so.
Good question :) This is related to what resources do we actually show; global ones or only those assigned to the worker (see below for more discussion)? To avoid dealing with too many things at once, let's assume that we only deal with "all workers CPUs" vs "all CPUs assigned to tasks" for the moment. I think that we have a few options there:
- Show only the assigned task CPU utilization
- Show both assigned and total CPU utilization (but here we might have to rethink the UI layout, the chart title is becoming a bit cramped)
- Switch between total worker and assigned CPU utilization with a button, as you have now
Not sure if I like the button switch. While it is useful to see both utilization regimes, it is also not very discoverable; when I tried your PR, I was confused by why doesn't it react to assigned tasks, before I realized that I have to press C. At the very least, the assigned task utilization should be the default, IMO.
I would personally prefer graying out the unassigned tasks (or having three levels - color for unmanaged CPUs, color for unassigned CPUs, and then green/yellow/red for assigned CPUs). In the current scheme it's not immediately obvious to me that the blue parts are not that important. But I'd have to see how it looks :)

There is also kind of a second level of utilization that we could somehow show visually - which CPUs are assigned to the given worker. Because right now we show all CPUs of the given PC, and the CPUs assigned to tasks, but in theory the worker might be managing only a subset of the CPUs (though that is probably relative rare? hard to tell). In fact, maybe we should just get rid of "global" CPU utilization and always just deal with the assigned resources (at least for CPUs, for memory it's more complicated).

It could be useful follow-up work, to 1) only show the worker assigned CPUs here, and also to only gather utilization for those CPUs in the worker collection loop.

Another follow-up could be to separate memory utilization across tasks (based on the memory utilization of their processes), but that can be very tricky to pull off, as tasks can spawn subprocesses, etc.

CC @spirali about the color scheme (blue vs graying out unassigned CPUs) and if you think that we should show all CPUs on the worker, or only those managed by the worker. The latter would be more consistent with how we treat non-CPU resources.

spirali · 2026-03-10T10:47:53Z

Majority of our users probably have whole node, but I am aware of some users that do subnode SLURM allocation. AFAIK they do not usually have more workers on a single node, they just have a single worker on a node that manages just subset of the node.

Graying out non-managed resources seems good to me. But it should be always possible to see non-managed CPUs (or ideally all node resources). There are usually two use cases:

"I want to know what is utilization of my tasks", then I usually want to know only managed resources.
"I want to know what is happening with the HW/node", then I usually want to see all information.

Let us say that will eventually have some "top"-like utility. For the first question, we want to only processes spawned by our task. For the second question we want to see all processes.

So we should probably support both views. I have no strong opinion on what is a good default.

Kobzol · 2026-03-10T10:50:25Z

Well, the thing is that knowing all resources of the node is not really something that we can robustly know. Currently, we sorta try to guess it for CPUs, but even there it might not be accurate. And for GPUs and other resources, we often might only have access to a subset of the resources.

I think that from our point of view, we cannot reliably saw what is the "whole node", and should mostly talk only about resources managed by HQ workers. What we could do though is to say something like "we detected N additional CPUs on the node" or something.

spirali · 2026-03-10T11:00:35Z

I wrote in "ideal case"; I know that we cannot provide "generic HW monitoring tool", but other CPUs are quite easy, so if we can provide them than I think that we should show them.

Kobzol · 2026-03-10T11:02:50Z

but other CPUs are quite easy

I guess that depends on how SLURM is configured, it is definitely possible to hide some CPUs from HyperQueue.

But yeah, I guess that we can show all CPUs (that are visible to us), especially since that we already do this today 😆 Just that I would treat it more as an auxiliary information rather than the main thing that we want to present.

Dym03 · 2026-03-10T11:28:49Z

I do agree, that not showing the worker assigned cpus utilization as default is not ideal, but i wanted to keep the original intention, and this as an added feature. But as you mention above, I agree that showing it as a default makes more sense, and opting out to see "all cpus" could be a alternative.

Also thinking about it now, if the user wants to see how the whole node/(all cpus in the list) are doing, they surely can do it without the fancy colors. So maybe even keep it simple without the switch might be a good option.

So i will change the color scheme to classic green->red to assigned cpus, and graying out the other cpus. Set it as a default. If you come to a final decision about the possibility to switch between all cpus <> assigned cpus, i can remove it or keep it as is.

I can try to brainstorm a bit to come up with new layout to represent the usage statistics. Alternatively we can discuss it when your schedule frees up.

Kobzol · 2026-03-10T12:30:02Z

👍 on all you said. Regarding

to switch between all cpus <> assigned cpus, i can remove it or keep it as is.

you can keep it, but please make the assigned CPUs view be the default.

spirali · 2026-03-10T13:52:26Z

I vote for keeping "assigned only" view, or at least sorting assigned cpus at the beginning. The original motivation for all this was when I need to find utilization of "my" 4 cpus on 256+ cpus machine.

Dym03 · 2026-03-13T20:03:45Z

As we discussed above, I have used the gray out of the unused cpus in the Worker cpu utilization. Also it is set as a default.

I also added the sorting of the used cpus as was metioned by Ada. Because i though that it makes sense on the bigger nodes, than mine, to be able to quickly locate and read the data that i want. In addition to that now in the worker utilization view, the usage of the assigned cpus is shown, and also the count. Let me know if it is ok or no.

Idea: Because maybe it might be misleading to show (5 CPUS) if the worker can work with all of them. So maybe {num_used}/{num_available} in showcase below (5/16 CPUS) would be clearer.

I am not sure about the change in the title, if Node is the correct thing to call it, as you discussed above.

Demonstration of the current state below:

Worker Utilization (Default)

Node (Global) Utilization

Kobzol · 2026-03-14T12:00:12Z

I didn't like the sorting at first, but I guess it makes sense, there's not really much point in seeing the CPUs ordered by their ID (except for hyper-threading).

Node utilization sounds good. In the worker utilization, there indeed might be some ways of separating assigned vs managed CPUs, but the space in the title is quite limited. Now that I see the output, I wonder if we should actually have three views:

Node utilization view: show all CPUs on the node visible to HyperQueue. If the set of node CPUs is the same as the set of worker managed CPUs, ignore this view (do not switch to it), as it would be the same as the worker view.
Worker utilization view: show all CPUs managed by the worker. Sort CPUs that are assigned to tasks first in the CPU list, but compute and show the total utilization of the worker in the title. This is useful to figure out how much of the worker resources are being utilized. I'd set this as the default view.
Assigned tasks view: show all CPUs currently assigned to some task on the worker. Show the total CPU utilization only across the assigned tasks. This is useful to figure out how much do my tasks utilize the CPUs that they were assigned.

@spirali What do you think of the three views idea? :)

One problem with that is that we cannot easily determine the memory utilization of the worker based on tasks and CPUs. But showing the memory utilization in the title is a bit of a hack anyway, maybe we could show that elsewhere, and keep the CPU view only for CPUs, and not memory?

Dym03 · 2026-03-14T16:53:32Z

Adding the extra view shouldn't be a problem.

The question is how do i get the information about the managed cpus against the node cpus. This seems to be a hard thing to aquire, afaik. Correct me if i am wrong, but right now only the assigned cpus are 'easy' to distinguish.

Kobzol · 2026-03-14T18:52:24Z

I think you already have all the information. The node list of CPUs is returned from the HW overview. The CPU resources managed by the worker are known, they are stored in the worker configuration. And you also know the CPUs of tasks that are currently assigned to the worker.

Add worker managed cpu utilization

f61e2f7

Dym03 force-pushed the Dashboard_worker_assigned_cpu_util_on_off branch from d70f387 to f61e2f7 Compare March 13, 2026 19:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Worker managed cpu utilization#1073

Worker managed cpu utilization#1073
Dym03 wants to merge 1 commit intoIt4innovations:mainfrom
Dym03:Dashboard_worker_assigned_cpu_util_on_off

Dym03 commented Mar 7, 2026

Uh oh!

Kobzol commented Mar 10, 2026 •

edited

Loading

Uh oh!

spirali commented Mar 10, 2026

Uh oh!

Kobzol commented Mar 10, 2026

Uh oh!

spirali commented Mar 10, 2026

Uh oh!

Kobzol commented Mar 10, 2026

Uh oh!

Dym03 commented Mar 10, 2026

Uh oh!

Kobzol commented Mar 10, 2026

Uh oh!

spirali commented Mar 10, 2026

Uh oh!

Dym03 commented Mar 13, 2026

Uh oh!

Kobzol commented Mar 14, 2026

Uh oh!

Dym03 commented Mar 14, 2026

Uh oh!

Kobzol commented Mar 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Dym03 commented Mar 7, 2026

Uh oh!

Kobzol commented Mar 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

spirali commented Mar 10, 2026

Uh oh!

Kobzol commented Mar 10, 2026

Uh oh!

spirali commented Mar 10, 2026

Uh oh!

Kobzol commented Mar 10, 2026

Uh oh!

Dym03 commented Mar 10, 2026

Uh oh!

Kobzol commented Mar 10, 2026

Uh oh!

spirali commented Mar 10, 2026

Uh oh!

Dym03 commented Mar 13, 2026

Uh oh!

Kobzol commented Mar 14, 2026

Uh oh!

Dym03 commented Mar 14, 2026

Uh oh!

Kobzol commented Mar 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Kobzol commented Mar 10, 2026 •

edited

Loading